CN110837479A

CN110837479A - Data processing method, related device and computer storage medium

Info

Publication number: CN110837479A
Application number: CN201810940730.8A
Authority: CN
Inventors: 饶蓉; 魏明昌
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Cloud Computing Technologies Co Ltd
Priority date: 2018-08-17
Filing date: 2018-08-17
Publication date: 2020-02-25
Anticipated expiration: 2038-08-17
Also published as: CN110837479B; WO2020034729A1

Abstract

The embodiment of the invention discloses a data processing method, which comprises the following steps: the method comprises the steps that an object storage device receives a data reading or writing IO request, wherein the data reading or writing IO request comprises a data integrity field DIF, and the DIF is used for bearing attribute information of data to be operated. Specifically, the attribute information of the data to be read or the attribute information of the data to be written may be used. Further, the object storage device processes the data to be operated according to the attribute information of the data to be operated. By adopting the embodiment of the invention, the performance of the disk can be improved, and the waste of storage resources can be avoided.

Description

Data processing method, related device and computer storage medium

Technical Field

The present invention relates to the field of storage technologies, and in particular, to a data processing method, a related device, and a computer storage medium.

Background

With the development and progress of the internet, the global data volume is growing at an extremely rapid rate every day. In the face of such huge data, the requirement for the data storage system is very important. At present, in order to meet the storage requirement of mass data, a distributed storage system is generally adopted to store the data.

However, in practice, it is found that the current distributed storage system stores data by using a uniform storage strategy and storage layout, which results in a loss of disk performance and a waste of storage resources.

Disclosure of Invention

The embodiment of the invention discloses a data processing method, related equipment and a computer storage medium, which reduce the performance loss of a disk and the waste of storage resources.

In a first aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the target storage device receives a data read IO request sent by a host, where the data read IO request includes a data integrity field DIF, and the DIF carries attribute information of data to be read, such as the size of the data to be read, the read granularity of the data to be read, and read information of an address to be read corresponding to the data to be read. Accordingly, the object storage device can process the data to be read according to the attribute information of the data to be read.

By implementing the embodiment of the invention, the data to be read can be stored and managed according to the attribute information of the data to be read, and personalized storage is carried out by utilizing the data characteristics, so that the problems of low disk performance, waste of storage resources and the like are avoided.

In some possible embodiments, the attribute information of the data to be read includes a read granularity of the data to be read, and the read granularity is used for indicating the size of the data to be read including the data to be read. Correspondingly, if the size of the pre-read data is greater than or equal to the first threshold, the object storage device may read the pre-read data including the data to be read, and send the pre-read data to the client, so as to cache the pre-read data in the memory of the client. The data reading method and the data reading device facilitate the next time when the host sends the same data reading IO request, corresponding data to be read is directly read from the cache of the client, so that the data reading time is saved, and the data processing efficiency is improved.

If the size of the pre-read data is smaller than the first threshold value, the object storage device can directly read the data to be read and send the data to be read to the host.

By implementing the embodiment of the invention, whether the pre-read data is extracted or not can be determined according to the size of the pre-read data, so that the data can be directly read from the memory of the client next time, the data reading time is saved, and the data processing efficiency is improved.

In some possible embodiments, the attribute information of the data to be read includes read information of an address to be read at which the data to be read corresponds. The read information may include, but is not limited to, read frequency or read type, etc. The reading information is used for indicating the frequency of data reading of the address to be read in unit time. Accordingly, when the read information is the first read information, the object storage device writes the data to be read into the non-volatile cache of the object storage device. When the read information is second read information, the target storage device writes the data to be read into a nonvolatile cache of the target storage device, and sets an expiration time in the nonvolatile cache to clear the data with the time length exceeding the expiration time length stored in the nonvolatile cache. Wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

In some possible embodiments, the attribute information of the data to be read includes read information of an address to be read at which the data to be read corresponds. The reading information is used for indicating the frequency of data reading of the address to be read in unit time. And when the frequency indicated by the reading information is greater than or equal to a second threshold value, sending the data to be read to the client side so as to cache the data in the memory of the client side. The data to be read can be directly read from the memory of the client side next time, so that the data reading efficiency is improved. Correspondingly, when the frequency indicated by the reading information is smaller than the second threshold value, sending the data to be read to the client, and caching the read data in a hard disk of the client.

In a second aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the object storage device receives a data write IO request forwarded by a client, where the data write IO request includes a data integrity field DIF, and the DIF carries attribute information of data to be written, such as a size of the data to be written, a write granularity of the data to be written, and write information of a address to be written corresponding to the data to be written. Correspondingly, the object storage device processes the data to be written according to the attribute information of the data to be written. By implementing the embodiment of the invention, the attribute information of the data to be written can be reasonably utilized to carry out storage management on the data to be written so as to solve the problems of disk performance loss, storage resource waste and the like in the existing distributed storage system.

In some possible embodiments, the attribute information of the data to be written includes writing information of a to-be-written address corresponding to the data to be written, such as a writing frequency or a writing type. The writing information is used for indicating the frequency of data writing at the address to be written in the unit time. Accordingly, when the write information is the first write information, the object storage device may store the data to be written in the non-volatile cache of the object storage device. When the write information is the second write information, the object storage device may store the data to be written in the hard disk of the object storage device. Wherein the frequency indicated by the first written information is greater than the frequency indicated by the second written information. The read-write rate of the nonvolatile cache is greater than that of the hard disk.

By implementing the steps, the data to be written can be stored in the corresponding cache or the hard disk according to the attribute characteristics of the data to be written, and the high efficiency of data processing is convenient to improve.

In some possible embodiments, the attribute information of the data to be written includes a writing granularity of the data to be written, and the writing granularity is used for indicating a storage granularity adopted when the data to be written is stored. After receiving a data write IO request sent by a host, a client can allocate corresponding stripe information to data to be written according to attribute information of the data to be written, wherein the stripe information includes one or more stripe units used when the data to be written is stored and a physical address of each stripe unit. Each stripe unit is used for storing a small part of data in the data to be written, and may also be referred to as data to be stored. Further, the client sends a stripe unit used when the data to be written is stored to the corresponding object storage device according to the stripe information, wherein the stripe unit carries the physical address of the client and the data to be stored which needs to be stored. Accordingly, the object storage device receives the stripe unit sent by the client. And the object storage equipment writes part of data (data to be stored) in the data to be written into a hard disk of the object storage equipment according to the physical address of the stripe unit.

By implementing the above steps, the client can configure a corresponding stripe for the data to be written according to the attribute information of the data, the stripe is composed of one or more stripe units, and each stripe unit is mapped with the hard disk in the object storage device. Accordingly, after receiving the stripe unit, the object storage device writes or stores the data to be written to the hard disk according to the stripe unit. The method is beneficial to improving the performance of the hard disk and the utilization rate of the hard disk.

In some possible embodiments, after configuring the stripe information, the client may create a corresponding storage mapping relationship for the data to be written, that is, a storage mapping relationship of the data to be written, according to the writing granularity of the data to be written. Correspondingly, the object storage device receives the storage mapping relationship of the data to be written, which is sent by the client, wherein the storage mapping relationship of the data to be written comprises the mapping relationship between the address to be written and the physical address when the data to be written is stored.

In some possible embodiments, the object storage device may further create a storage mapping relationship of the data to be stored according to the writing granularity of the data to be written, where the storage mapping relationship includes a mapping relationship between a logical address and a physical address when the data to be stored is stored, and the logical address is related to the address to be written of the data to be written. Specifically, the logical address may be determined and obtained according to the address to be written and the write granularity of the data to be written.

In a third aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the host computer obtains the attribute information of the data to be operated and generates a data operation request according to the attribute information of the data to be operated. Further, the host sends a data operation request to the client, the data operation request including a data integrity field DIF. The DIF is used to carry attribute information of data to be operated.

By implementing the embodiment of the invention, the host can obtain the attribute information of the data to be operated according to the actual requirement, and further generate a data operation request to be sent to the client side or forwarded to the object storage device through the client side. The client or the object storage device can conveniently store and manage the data to be operated according to the attribute information of the data to be operated so as to realize storage according to the attribute characteristics of the data, reduce the performance loss of the disk, avoid the waste of storage resources and the like.

In some possible embodiments, the data operation request may be a data read IO request. Correspondingly, the attribute information of the data to be operated may include a reading granularity of the data to be read and/or reading information of a reading address corresponding to the data to be read, where the reading information may specifically be a reading frequency or a reading type. Wherein the read granularity is used to indicate the size of the pre-read data including the data to be read. The reading information is used for indicating the frequency of data reading of the address to be read in unit time.

In some possible embodiments, the data operation request may be a data write IO request. Correspondingly, the attribute information of the data to be operated includes the writing granularity of the data to be written and/or the writing information of the address to be written corresponding to the data to be written, and the writing may be a writing frequency or a writing type. The writing granularity is used for indicating the storage granularity adopted when the data to be written is stored. The writing information is used for indicating the frequency of data writing at the address to be written in the unit time.

In a fourth aspect, an embodiment of the present invention discloses a data processing method, where the method includes: the method comprises the steps that a client receives a data write IO request sent by a host, wherein the data write IO request comprises a Data Integrity Field (DIF), and the DIF bears attribute information of data to be written. Correspondingly, the client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises at least one stripe unit used when the data to be written is stored and a physical address of each stripe unit. And the client sends the stripe unit carrying partial data to be written and the physical address to the corresponding object storage equipment according to the stripe information.

In some possible embodiments, the attribute information of the data to be written includes a write granularity of the data to be written. Accordingly, the client may configure a stripe having a size of a stripe unit matching the write granularity for the data to be written, and the information (i.e., stripe information) of the stripe includes at least one stripe unit at the time of storing the data to be written and a respective physical address of each stripe unit. Each stripe unit is used for storing partial data in the data to be written. In other words, the client selects the stripe with the same stripe unit and writing granularity to store the data to be written, which is beneficial to improving the utilization rate of the disk.

In some possible embodiments, the attribute information of the data to be written includes writing information of a to-be-written address corresponding to the data to be written, and is used for indicating a frequency of data writing at the to-be-written address in a unit time. Accordingly, the client may configure a stripe for the data to be written that matches the frequency indicated by the write information. The writing type may be a writing frequency or a writing type. In other words, the client may store data to be written of the same write type or at the same write frequency range into the same stripe.

In a fifth aspect, an embodiment of the present invention provides a data processing apparatus (which may specifically be an object storage device), including a communication unit and a processing unit, where:

the communication unit is configured to receive a data read IO request, where the data read IO request includes a data integrity field DIF, and the DIF is used to carry attribute information of data to be read;

and the processing unit is used for processing the data to be read according to the attribute information of the data to be read.

In some possible embodiments, the attribute information of the data to be read includes a read granularity of the data to be read, which is used to indicate a size of the data to be read, including the data to be read. The processing unit is specifically configured to, when the size of the pre-read data is greater than or equal to a first threshold, read the pre-read data including the data to be read, and send the pre-read data to a client for caching in a memory of the client; or when the size of the preset data is smaller than a first threshold value, reading the data to be read, and sending the data to be read to a host.

In some possible embodiments, the attribute information of the data to be read includes read information of a address to be read corresponding to the data to be read, and the read information is used to indicate a frequency of data reading of the address to be read in a unit time. The processing unit is specifically configured to, when the read information is first read information, write the data to be read into a nonvolatile cache of the object storage device; or, when the read information is second read information, writing the data to be read into a nonvolatile cache of the object storage device, and setting an expiration time length in the nonvolatile cache of the object storage device, so as to clear data stored in the nonvolatile cache, wherein the expiration time length exceeds the data stored in the nonvolatile cache; wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

For the content that is not shown or described in the embodiment of the present invention, reference may be made to the related explanations in the embodiment described in the foregoing first aspect, which are not described herein again.

In a sixth aspect, an embodiment of the present invention provides another data processing apparatus (specifically, an object storage device), including a communication unit and a processing unit, where:

the communication unit is configured to receive a data write IO request, where the data write IO request includes a data integrity field DIF, and the DIF is used to carry attribute information of data to be written;

and the processing unit is used for processing the data to be written according to the attribute information of the data to be written.

In some possible implementations, the attribute information of the data to be written includes writing information of a to-be-written address corresponding to the data to be written, where the writing information is used to indicate a frequency of data writing at the to-be-written address per unit time. The processing unit is specifically configured to, when the write information is first write information, store the data to be written in a nonvolatile cache of the object storage device by the object storage device; or, when the write-in information is second write-in information, the object storage device stores the data to be written in a hard disk of the object storage device; wherein the frequency indicated by the first written information is greater than the frequency indicated by the second written information.

In some possible embodiments, the attribute information of the data to be written includes a writing granularity of the data to be written, which is used to indicate a storage granularity adopted when the data to be written is stored. The client receives the data write IO request, and configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises a stripe unit used when the data to be written is stored and a physical address of the stripe unit; the client sends the stripe unit carrying the data to be written and the physical address to the communication unit according to the stripe information; and the communication unit receives the stripe unit which carries the data to be written and the physical address and is sent by the client. Correspondingly, the processing unit is specifically configured to store the data to be written in a hard disk of the object storage device according to the physical address of the stripe unit.

In some possible embodiments, the processing unit is further configured to create a storage mapping relationship of the data to be written according to a writing granularity of the data to be written, where the storage mapping relationship includes a mapping relationship between the address to be written and the physical address when the data to be written is stored.

For the content that is not shown or described in the embodiment of the present invention, reference may be made to the related explanations in the embodiment of the second aspect, which are not described herein again.

In a seventh aspect, an embodiment of the present invention provides another data processing apparatus (specifically, a host), including a communication unit and a processing unit, where:

the processing unit is used for acquiring attribute information of data to be operated;

the communication unit is configured to send a data operation request to a client, where the data operation request includes a data integrity field DIF, and the DIF is used to carry attribute information of the data to be operated.

In some possible embodiments, when the data operation request is a data read IO request, the attribute information of the data to be operated includes a read granularity of the data to be read and/or read information of a address to be read corresponding to the data to be read. The reading granularity of the data to be read is used for indicating the size of the pre-read data including the data to be read; and the reading information of the address to be read is used for indicating the frequency of data reading of the address to be read in unit time.

In some possible embodiments, when the data operation request is a data write IO request, the attribute information of the data to be operated includes write granularity of the data to be written and/or write information of a address to be written corresponding to the data to be written. The writing granularity of the data to be written is used for indicating the storage granularity adopted when the data to be written is stored; the writing information of the address to be written is used for indicating the frequency of data writing of the address to be written in unit time.

For the content that is not shown or described in the embodiment of the present invention, reference may be made to the related explanation in the embodiment described in the foregoing third aspect, which is not described herein again.

In an eighth aspect, an embodiment of the present invention provides an object storage device, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory to store instructions; a processor for invoking instructions in a memory for performing the method described in the first aspect or any possible implementation manner of the first aspect. In a ninth aspect, an embodiment of the present invention provides another object storage device, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory to store instructions; a processor for calling instructions in the memory to perform the method described in the second aspect or any possible implementation of the second aspect.

In a tenth aspect, an embodiment of the present invention provides a host, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory to store instructions; a processor for calling instructions in the memory to perform the method described in the third aspect or any possible implementation manner of the third aspect. In an eleventh aspect, an embodiment of the present invention provides a client, including: a processor, a memory, a communication interface and a bus; the processor, the communication interface and the memory are communicated with each other through a bus; a communication interface for receiving and transmitting data; a memory to store instructions; a processor for calling instructions in the memory to perform the method described in the fourth aspect or any possible implementation of the fourth aspect.

In a twelfth aspect, a computer non-transitory (non-transitory) storage medium storing program code for data processing is provided. The program code comprises instructions for performing the method described in the first aspect above or any possible implementation of the first aspect.

In a thirteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprises instructions for carrying out the method described in the second aspect above or any possible implementation of the second aspect.

In a fourteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprises instructions for performing the method described in the third aspect above or any possible implementation of the third aspect.

In a fifteenth aspect, a computer non-transitory storage medium storing program code for data processing is provided. The program code comprises instructions for carrying out the method described in the fourth aspect above or any possible implementation of the fourth aspect.

The above storage medium may be nonvolatile.

In a sixteenth aspect, there is provided a chip product for carrying out the method of the first aspect or any possible embodiment of the first aspect.

In a seventeenth aspect, there is provided a chip product for performing the method of the second aspect or any possible embodiment of the second aspect.

In an eighteenth aspect, there is provided a chip product for carrying out the method of the third aspect or any possible embodiment of the third aspect.

In a nineteenth aspect, there is provided a chip product for performing the method of the fourth aspect or any possible embodiment of the fourth aspect.

By implementing the embodiment of the invention, the problems of resource waste and performance loss in the existing distributed storage system can be solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below.

Fig. 1 is a schematic diagram of a network framework of a data processing system according to an embodiment of the present invention.

Fig. 2 is a schematic flowchart of a data processing method according to an embodiment of the present invention.

Fig. 3 is a schematic format diagram of a data operation request according to an embodiment of the present invention.

Fig. 4 is a schematic flow chart of another data processing method according to an embodiment of the present invention.

Fig. 5 is a schematic flow chart of another data processing method according to an embodiment of the present invention.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention.

Fig. 7 is a schematic structural diagram of another data processing apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described in detail below with reference to the accompanying drawings of the present invention.

First, some technical terms related to the present application are introduced.

Logical Block Address (LBA), also known as logical address or relative address. Refers to the addresses that a block of data stored on disk or tape has for retrieval or rewriting. Or may refer to a logical address given by an access instruction in a computing device having address translation functionality.

The physical block address (LBA) is also known as the physical address. Refers to the address transmitted from the Central Processing Unit (CPU) address bus for addressing. Where physical addresses are typically mapped to memory or storage. In a computing device with address translation functionality, logical addresses may be translated to actual effective addresses (i.e., physical addresses) in memory by computation or translation of addressing modes.

Stripe (stripe) is a method of dividing continuous data into data blocks of the same size and writing each data block to a different disk in an array. Among them, there are two parameters that affect the banding effect: stripe depth and stripe size. Stripe depth refers to the number of stripes that can be read or written in parallel at the same time. The stripe size (stripe size) refers to the size of a data block written on each disk. The present application will be exemplified below.

In order to solve the problems of performance loss, resource waste and the like in the existing data storage scheme, the application provides a data processing method, a network framework applicable to the method and related equipment. First, referring to fig. 1, a network framework diagram of a data processing system according to an embodiment of the present invention is shown. The data processing system 100 shown in fig. 1 includes application software 102, a virtual disk interface 104, a client 106, a metadata controller 108 (MDC), and an object-based storage device 110 (OSD). Wherein the content of the first and second substances,

the application software 102 may issue a data read/write input/output (IO) request to the virtual disk according to an actual requirement of the application software, where the IO request carries information such as a size of data to be read/written and a Logical Block Address (LBA) of the data storage, which is hereinafter referred to as a logical address.

Virtual disk interface 104 may be an interface provided by a Virtual Block Storage (VBS) management component for accessing virtual disks. Specifically, the application software 102 sends a data read/write IO request to the corresponding virtual disk through the virtual disk interface 104, so as to read or write data from the corresponding disk. In practical applications, the virtual disk interface 104 and the application software 102 are usually deployed on the same physical device (e.g., a host or a server, etc.). The metadata controller 108 is responsible for managing the object storage devices 110, and the number of the object storage devices 110 may be one or more, n is illustrated as an example, and n is a positive integer. In a distributed data storage system, the number of object storage devices 110 is typically multiple.

Data processing system 100 may also be deployed in two parts, with application software 102 and virtual disk interface 104 deployed on a client's device, and client 106, metadata controller 108, and object storage device 110 deployed within a cloud service provider's data center. The cloud service provider provides object storage services to customers through clients 106, metadata controller 108, object storage devices 110. The client accesses the object storage service through application software 102 and virtual disk interface 104.

In particular, the metadata controller 108 may be responsible for maintaining a connection state between the object storage device 110 and the client 106, including an online state and an offline state. The online status indicates that the object storage device 110 and the client 106 can communicate normally and a communication connection relationship is established. The offline state refers to no communication connection between the object storage device 110 and the client 106, i.e., the client cannot store data into the object storage device in the offline state. Optionally, the metadata controller 108 may also deploy the object storage device 110 according to a certain deployment policy, such as a partition deployment policy, a balanced load deployment policy, and the like, which is not limited herein. Accordingly, the metadata controller 108 may know information of each object storage device OSD currently communicating with the client 106, such as an internet protocol address (IP) of the OSD, an identification of the OSD, and the like. Optionally, the metadata controller 108 may send information of the object storage device 110 to the client 106 in advance, so that after the client 106 receives the data read/write IO request, the corresponding object storage device 110 can be calculated according to the LBA logical address in the request, which is not described in detail herein. Further, the client may establish communication between the client 106 and the object storage device 110 according to the information (e.g., identification, IP address, etc.) of the OSD, for example, the client 106 forwards the data read/write IO request to the object storage device, etc.

In practical applications, the metadata controller 108 may be deployed to a physical device alone, or may be deployed to one or more physical devices in a distributed manner, without limitation.

The client 106 may be a software module, and is configured to receive a data read/write IO request issued by the application software 102 through the virtual disk interface 104. Further, the client 106 may configure a corresponding data stripe for the data to be read/written according to the data read/write IO request, so as to write or store the data into the data stripe. In other words, the data is stored in a stripe manner in the present application. Optionally, the client 106 may also calculate an Error Code (EC) corresponding to the data by using a preset algorithm. The preset algorithm is set by a user or a system in a self-defined mode, such as an EC check code algorithm. How to calculate the EC check codebook application is not described in detail here, and how to configure the data stripes is specifically described in detail below in the present application.

Optionally, the client 106 may further send the data stripe to be stored with the data, the EC check code, and the received data read/write IO request to the corresponding object storage device OSD together or separately. For example, after obtaining the data stripe to be stored in the data and the EC check code, the client 106 may add the two pieces of information to the data read/write IO request for repackaging, and send the repackaged data read/write IO request to the object storage device, which is not limited in this application.

Accordingly, the object storage device 110 may receive a data read/write IO request and perform reading or writing of corresponding data according to the request. When the target storage device 110 receives a data read IO request, the data block, the metadata, and the like in the corresponding virtual disk are read according to the LBA logical address in the request. Metadata herein may refer to data that describes attributes of a data block, such as one or more of a logical address at which the data block is stored, a physical address, and a size of the data block. Accordingly, when the target storage device 110 receives a data write IO request, data can be written to the corresponding virtual disk according to the LBA logical address in the request. The details of how the object storage device performs corresponding data operations according to the data read/write IO request are described in detail below in this application.

In actual practice, the application software 102, the virtual disk interface 104, and the client 106 are typically deployed on one physical device. Object storage device 110 is deployed to another physical device. Alternatively, object storage device 110 may be deployed on the same physical device as application software 102, client 106, and the like. The metadata controller 108 is deployed separately to another physical device. The present application is not limited with respect to the deployment of the components in fig. 1.

In an alternative embodiment, object storage device 110 includes a cache and a disk. The number of the cache and the disk is not limited, and 1 cache and 1 disk are illustrated as an example. The disk includes but is not limited to a virtual disk or a physical disk, etc. In order to avoid the problem of disk collision caused by the fact that multiple processes access the same disk, the method and the device balance the IO load (data) into the disks of the object storage devices by adopting a stripe technology. The following description will be made regarding data storage by taking n object storage devices as an example. That is, the disks of each of the n object storage devices provide a corresponding storage space (stripe unit) for storing a small portion of the data. As shown, the data may be divided into n parts, resulting in n data blocks, each data block comprising a small portion of the data in the data. Accordingly, each object storage unit in the n object storage devices may provide one stripe unit to store one data block, that is, n stripe units may be obtained to store n data blocks correspondingly. As shown, the object storage device 1 may provide a stripe unit 1 to store a first data block (data block 1), and the object storage device 2 may provide a stripe unit 2 to store a second data block (data block 2). By analogy, the object storage device n may provide a stripe unit n to store the nth data block (data block n). In other words, the stripe for storing the entire data here is composed of n stripe units from n object storage devices, i.e., one stripe unit provided by each of n disks.

Accordingly, after some object storage device learns the stripe unit carrying the data block, the data block can be stored in the stripe unit according to the physical address of the stripe unit, that is, the data block is written or stored in its own disk according to the physical address of the stripe unit. For example, taking the object storage device 1 as an example, after it knows that the stripe unit 1 needs to store the data block 1, it can write the data block 1 to the stripe unit 1 according to the physical address of the stripe unit 1, that is, to the disk of itself.

Similarly, when n object storage devices respectively acquire the data blocks to be stored in the stripe units of the object storage devices, the corresponding data blocks can be stored according to the physical addresses of the corresponding stripe units. When the n object storage devices finish respective data block storage, the storage of the whole data can be finished.

Wherein the cache in the object storage device is used for temporarily or transiently storing data. Optionally, a timer may be further set in the cache, and when the data stored in the cache exceeds a preset expiration time, the data in the cache may be processed, for example, the data in the cache is cleared or the data in the cache is written into a hard disk.

Next, please refer to fig. 2, which is a flowchart illustrating a data processing method according to an embodiment of the present invention. The data processing method as shown in fig. 2 includes the following implementation steps:

step S202, the host computer obtains attribute information of the data to be operated and generates a data operation request according to the attribute information of the data to be operated.

And step S204, the host sends a data operation request to the client. Accordingly, the client receives the data operation request and forwards the data operation request to the object storage device.

In this application, a host refers to a physical device running application software. Specifically, the application software in the host computer can obtain the attribute information of the data to be operated according to the actual requirement. Further, the host may generate a data operation request and encapsulate the attribute information of the data to be operated into a Data Integrity Field (DIF) in the data operation request. And further sending the data operation request to the client so as to forward the data operation request to the object storage device through the client.

The data to be operated on here may include, but is not limited to, data to be read or data to be written, etc. Accordingly, the attribute information of the data to be operated may specifically refer to attribute information of the data to be written, attribute information of the data to be read, or the like. The attribute information about the data to be operated on is specifically set forth below.

Step S206, the object storage device obtains a data operation request, wherein the data operation request comprises attribute information of the data to be operated, and the attribute information is used for describing the attribute of the data to be operated.

And S208, the object storage device processes the data to be operated according to the attribute information of the data to be operated.

In this application, the data operation request may specifically be a data read IO request or a data write IO request. When the data operation request is a data read IO request, the data to be operated may be specifically data to be read. When the data operation request is a data write IO request, the data to be operated may specifically be data to be written.

The data operation request includes a flag field customized by a user or a system, where the flag field is used to indicate attribute information of the data to be operated, such as a size of the data to be operated, a data IO granularity, a logical address of the data to be operated, read information (e.g., a read frequency or a read type) and write information (e.g., a write frequency or a write type) related to the logical address, which is described in detail below.

Specifically, fig. 3 shows a format diagram of a data operation request. The data operation request includes a data field and a Data Integrity Field (DIF). The data field is used for carrying data to be transmitted, and may specifically be data to be operated (data to be read or data to be written). The DIF field is used for detecting the integrity of data to be transmitted, and as shown in the figure, the DIF field comprises a check area, a version area, an application software area and an address area. The respective sizes of the data field and the DIF field may be specifically set by a user or a system, for example, several storage modes such as 512+8 bytes (bytes), 4096+8, 4096+64, and the like are adopted, and the present application is not limited. Wherein the content of the first and second substances,

the check area is used to carry check data, such as Cyclic Redundancy Check (CRC) data. The byte size occupied by the check region can be customized by a user or a system, for example, 2 bytes.

The version area is used to indicate the version number of the DIF, and the occupied size of the version area can be customized by a user or a system, for example, 1 byte.

The address area is used for indicating the logical address LBA where the data to be operated corresponds to. The occupied size can be specifically set by a user or a system, for example, 4 bytes.

The application software area is composed of a reserved field and an LBA valid indication field, and the size occupied by the application software area can be specifically set by a user or a system, for example, the illustration shows 1 byte as an example. The LBA valid indication field is used to indicate whether the LBA logical address carried by the address area is valid. Specifically, when the LBA valid indication field is a first preset character (for example, 1), it indicates that the LBA logical address carried in the address area is valid; when the LBA valid indication field is a second predetermined character (e.g. 0), it indicates that the LBA logical address carried in the address area is invalid. The size occupied by the LBA valid indication field may be specifically set by a user or a system in a self-defined manner, for example, 1 bit (bit).

The reserved field is a field to be defined, and the occupied size of the reserved field can also be set by a user or a system in a self-defining mode, such as 7 bits and the like. The reserved field is redefined, namely the attribute information of the data to be operated is carried in the reserved field. Specifically, the reserved field includes, but is not limited to, any one or more of a data IO granularity field, a data write field, and a data read field, and the size and the position occupied by each of the reserved fields may be defined according to actual requirements, which is not limited in this application. Wherein the content of the first and second substances,

a data IO granularity field, which indicates different meaning in different application scenarios. Specifically, in a data writing application scenario (that is, the data operation request is a data writing IO request), the data IO granularity field may specifically be a writing granularity field, which is used to indicate a minimum unit size adopted when the data to be operated is stored. In other words, the minimum unit granularity at which data to be stored is divided when stored. The size and the position occupied by the storage granularity field can be specifically set by a user or a system in a self-defining way. Illustratively, the storage granularity field occupies 3 bits, and the write granularity represented by the storage granularity field is 8 types as follows: 512 bytes, 1Kbyte, 4Kbyte, 8Kbyte, 16Kbyte, 32Kbyte, 64Kbyte, and 12Kbyte, etc.

In a data reading application scenario (that is, the data operation request is a data reading IO request), the data IO granularity field may be specifically a reading granularity field, and is used to indicate the size of the pre-read data, where the pre-read data includes the data to be operated. The pre-read data is data to be read. For example, it is assumed that the upper layer application software needs to read 1M data (that is, the size of the pre-read data is 1M), and the size of the data read IO request supported by the application software every time is 1K, that is, the size of the data requested to be read by the data read IO request issued by the application software every time is 1K. Correspondingly, to complete the reading of 1M data, the application software needs to issue 1000 data read IO requests. However, according to the method and the device, the size of the pre-read data can be indicated by reading the granularity field, the pre-read data can be read in advance by adopting one-time data reading IO request, the time is saved, and the data reading efficiency is improved.

And the data writing field is used for indicating writing information of an address to be written, wherein the address to be written is an address when the data to be operated is written, and the address to be written can be an LBA (logical block addressing) logical address. The writing information includes but is not limited to writing type or writing frequency, etc. In other words, the data write field is used to indicate the write type or write frequency of the address to be written. Specifically, the data writing field may be configured to reflect a writing frequency of writing data in the address to be written in a unit time, that is, a frequency of data writing occurring in the address to be written in the unit time.

Optionally, in this application, a device (specifically, a client or an object storage device) may further obtain a write type of the address to be written according to a write frequency of the address to be written. For example, when the writing frequency of the address to be written is in the first threshold range, the corresponding writing type may be considered or determined as the first writing type, for example, fast overwriting or frequent overwriting. When the write frequency of the address to be written is within the second threshold range, it may be considered or determined that its corresponding write type is the second write type, e.g., rarely overwritten, etc. The upper limit value and the lower limit value of each of the first threshold range and the second threshold range may be set by a user or a system, and the lower limit value of the first threshold range is greater than or equal to the upper limit value of the second threshold range.

The size occupied by the data writing field can be specifically set by a user or a system according to actual requirements, such as 2bit and the like. When the same address to be written has data written for multiple times, the data is covered, that is, the data written later can cover the data written last time. Therefore, the writing frequency or type of the address to be written also indicates the writing frequency or type of the address to be written, at which data overwriting occurs.

And the data reading field is used for indicating the reading information of the address to be read. The address to be read is an address when the data to be operated is read, and may specifically be an LBA logical address. The read information includes, but is not limited to, a read type or a read frequency of an address to be read. Specifically, the data reading field may be configured to reflect a reading frequency of data reading of the address to be read in a unit time.

Optionally, the device (specifically, the client or the object storage device) of the present application may further obtain a reading type of the address to be read according to the reading frequency of the address to be read. For example, when the read frequency of the address to be read is in the third threshold range, the corresponding read type may be considered or determined as the first read type, e.g. read often. When the reading frequency of the address to be read is within the fourth threshold range, it may be considered or determined that the corresponding reading type is the second reading type, for example, that reading is to be performed multiple times or sequential reading is to be performed. When the reading frequency of the address to be read is within the fifth threshold range, the corresponding reading type can be considered or determined as a third type, for example, reading is rarely performed or only performed once. The third threshold range, the fourth threshold range, and the fifth threshold range may be specifically set by a user or a system. Wherein a lower limit of the third threshold range is greater than or equal to an upper limit of the fourth threshold range, and a lower limit of the fourth threshold range is greater than or equal to an upper limit of the fifth threshold range.

The size occupied by the data reading field can be specifically set by a user or a system in a self-defining way, such as 2bit and the like.

The following describes related embodiments related to the data storage method provided by the present invention, in combination with two application scenarios to which the present invention is applicable. First, in an application scenario of writing data, refer to fig. 4, which is a schematic flowchart of another data processing method proposed in the embodiment of the present invention. Please refer to fig. 4, which illustrates the method comprising the following steps:

step S402, the application software acquires the attribute information of the data to be written.

Step S404, the application software sends a data write IO request to the client, wherein the data write IO request comprises a Data Integrity Field (DIF), and the DIF is used for bearing the attribute information of the data to be written. Accordingly, the client receives the data write IO request.

In the application, the application software (specifically, the host running the application software) can issue a data write IO request to the client according to actual requirements, where the data write IO request includes attribute information of the data to be written. Specifically, the attribute information of the to-be-written data is carried in a DIF field in the data write IO request, and the attribute information of the to-be-written information may include, but is not limited to, any one or a combination of more of the following: the address to be written (specifically, the LBA logical address) corresponding to the data to be written, the size of the data to be written, and information such as write information and data IO granularity of the address to be written. Accordingly, the client receives the data write IO request.

Step S406, the client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information includes at least one stripe unit used when the data to be written is stored and a physical address of each stripe unit in the at least one stripe unit.

Step S408, the client correspondingly sends the at least one stripe unit and the physical address of the stripe unit to each object storage device according to the stripe information. Each stripe unit is used for storing a small part of data in the data to be written. Correspondingly, the object storage device receives a stripe unit and a physical address of the stripe unit, which are sent by the client, wherein the stripe unit carries data to be stored, and the data to be stored is part of data in the data to be written. Optionally, the stripe unit also carries the physical address of the stripe unit.

Step S410, the object storage device processes the data to be stored in the data to be written according to the attribute information of the data to be written.

In step S406, after receiving the data write IO request, the client may determine whether to configure a corresponding data stripe for the data to be written according to the attribute information of the data to be written in the data write IO request, so as to store the data to be written. Specifically, when the attribute information of the data to be written includes the write information of the address to be written corresponding to the data to be written. The writing information may specifically be a writing frequency or a writing type, which may be used to indicate a frequency or a frequency range in which data writing occurs at the address to be written in a unit time. When the write information is first write information used for indicating that the frequency of data writing of the address to be written in unit time is in a first threshold range, or the write type indicated by the first write information is fast overwriting or often overwriting, and the like, the client may give up configuring corresponding stripe information for the data to be written. On the contrary, when the write information is the second write information, which is used to indicate that the frequency of data writing at the address to be written in the unit time is in the second threshold range, or the write type indicated by the first write information is rarely overwritten, etc., the client may configure corresponding stripe information for the data to be written, which is described in detail below. Wherein the lower limit value of the first threshold range is greater than or equal to the upper limit value of the second threshold range.

Optionally, after receiving the data write IO request, the client may also configure corresponding stripe information for the data to be written according to the attribute information of the data to be written in the data write IO request, and issue the stripe information to the corresponding object storage device. And determining, at the object storage device side, whether to store the data according to the physical address of the corresponding stripe unit in the stripe information again according to the attribute information of the data to be written, which is not limited in the present application.

The client configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, and the following implementation modes are specifically provided:

first, when the attribute information of the data to be written includes the write granularity of the data to be written, the client may select a stripe unit having a size matching the write granularity for the data to be written according to the write granularity, so as to store the data to be written. Specifically, to reduce the data storage amount, stripe units with the same stripe size as the writing granularity are generally selected to form a stripe for storing the data to be written. After each stripe unit in the stripe is obtained, the physical address corresponding to each stripe unit can be obtained, and the stripe units are in one-to-one correspondence with the physical addresses where the stripe units are corresponding.

For example, assume a data write IO request size of 32K with a write granularity of 8K. Accordingly, when selecting a configuration stripe for data to be written, the client may select 4 stripes with 8K size to be combined to form a stripe storing the data to be written. In other words, the band here includes 4 bands of 8K size. Accordingly, in data storage, 32K of data to be written can be divided into 4 copies of data to be stored, each copy of data to be stored is 8K in size, and stored in one stripe unit. These 4 stripe units may come from 4 object storage devices, respectively, and are not discussed here.

Secondly, when the attribute information of the data to be written includes the write information of the address to be written corresponding to the data to be written, the write information may be a write frequency or a write type. Correspondingly, the client can allocate corresponding stripe information to the data to be written according to the writing information, so as to store the data to be written. The stripe unit includes one or more stripe units and a physical address of each stripe unit, and each stripe unit is used for storing a part of data (i.e., data to be stored) in the data to be written.

Specifically, when the write information is a write frequency, the client may write or store data to be written in the same frequency range to the same stripe or stripe unit. The same frequency range can be specifically set by a user or a system according to the actual requirement in a self-defining way, for example, 10-15 times/minute and the like. When the write information is a write type, the object storage device may store data to be written of the same write type into the same stripe or stripe unit, so as to implement classified storage of the data.

And thirdly, the attribute information of the data to be written comprises the writing granularity of the data to be written and the writing information of the address to be written corresponding to the data to be written. Accordingly, the client may configure corresponding stripe information for the data to be written according to the write granularity and write information of the address to be written, where the stripe information includes one or more stripe units for storing the data to be written and a respective physical address of each stripe unit. Each stripe unit is used for storing partial data (namely, data to be stored) in the data to be written. Illustratively, a stripe unit with a stripe size equal to the writing granularity is selected, and the stripe unit supports carrying or storing data matched with the writing information of the address to be written. In other words, the client configures a corresponding stripe for the data to be written in consideration of the writing type of the data to be written and the writing information (writing frequency or writing type) of the address to be written, so as to store the data to be written.

In an optional embodiment, after receiving the data read IO request, the client may further calculate a corresponding EC check code for the data to be written, for example, calculate the EC check code by using a preset algorithm. Correspondingly, the client can also send the EC check code obtained by calculation and the data to be written to the object storage device together, so that the object storage device can store the data to be written and the EC check code together.

Optionally, in the process of storing data in a stripe manner, for an EC check code of a stripe that is not a full stripe, a write type of the address to be written in the stripe may be marked as a preset type, for example, to be overwritten. The method is convenient for the system background to recalculate the new EC check code, combines the data to be written to form a full strip, and covers the EC check code obtained by the original calculation, thereby being beneficial to improving the utilization rate of the disk storage space.

After configuring stripe information for the data to be written, the client may correspondingly distribute one or more stripe units in the stripe information to corresponding object storage devices, where the stripe unit may carry a physical address of the stripe unit and the data to be stored, where the data to be stored is part of the data to be written. For the mapping relationship between the stripe unit and the object storage device, reference may be specifically made to relevant explanations in the foregoing embodiments, and details are not described here. In other words, after learning the stripe unit for storing the data to be stored, the client may learn the physical address of the stripe unit, and may further learn the object storage device or the hard disk (magnetic disk) of the object storage device where the stripe unit is located, which is not limited and described in detail herein.

Accordingly, the target storage device may subsequently write the data to be stored into the hard disk of the target storage device according to the received stripe unit and the storage address of the stripe unit.

In step S410, after receiving the data write IO request sent by the client and the stripe unit carrying the data to be stored and the physical address, the object storage device may store the data to be stored according to the attribute information of the data to be written in the data write IO request.

Specifically, the attribute information of the data to be written includes writing information (e.g., writing frequency or writing type) of an address to be written. And when the write-in information is first write-in information, the first write-in information is used for indicating the frequency of data write-in of the address to be written. Specifically, the first write information may be used to indicate that the write frequency of the address to be written is in a first threshold range, or that the write type of the address to be written is a first type (e.g., frequently overwritten, soon overwritten, etc.). The subject storage device may write or store the data to be written to the first non-volatile cache (typically disposed internal to the subject storage device) accordingly. Optionally, after a period of time elapses, the object storage device may write the data stored in the first nonvolatile cache to a nonvolatile hard disk of the object storage device, so that a storage medium of the data to be written is reasonably arranged based on characteristics of the data to be written, and the data reading and writing efficiency is improved. Due to the fact that the writing frequency of the address to be written is high, the data to be written is stored in the cache for a period of time and then written into the hard disk, so that if the target storage device receives other new write IO requests aiming at the address to be written in the period of time, the data to be written carried by the new write IO requests can cover the old data to be written in the cache, and only the data to be written aiming at the latest write IO request of the address to be written can be written into the hard disk after the period of time is over. The low efficiency caused by that the data to be written carried by each write IO request aiming at the address to be written needs to be written into the hard disk is avoided.

When the write information is second write information, the second write information is used to indicate that the write frequency of the address to be written is in a second threshold range, or the write type of the address to be written is a second type (e.g., is rarely overwritten). Which can both be used to indicate the frequency with which data writes occur for the data to be written. Accordingly, the object storage device can directly write or store the data to be written into the nonvolatile hard disk of the object storage device. Specifically, the object storage device writes or stores the data to be stored in the data to be written into a nonvolatile hard disk of the object storage device according to the physical address of the stripe unit.

The first threshold range and the second threshold range are preset by a user or a system, and the lower limit value of the first threshold range is greater than or equal to the upper limit value of the second threshold range. The read-write speed of the nonvolatile cache is greater than that of the nonvolatile hard disk.

In other words, the subject storage device may store or write different write frequencies or write types of data to be written into different non-volatile memories. For example, if the write type of the address to be written corresponding to the data to be written is fast overwritten, the target storage device may write the data to be written into the nonvolatile cache, not into the disk or the hard disk. Optionally, the object storage device may further store the data to be written with the writing type of frequently overwritten and rarely overwritten separately to reduce data access interference.

In an optional embodiment, the attribute information of the data to be written includes a writing granularity of the data to be written. After the object storage device receives the data write IO request and the stripe unit carrying the data to be stored, the object storage device may construct a storage mapping relationship of the data to be stored according to the write granularity of the data to be written. The storage mapping relation comprises a mapping relation between a logical address and a physical address when the data to be stored is stored. The logical address is determined according to the address to be written and the write granularity.

Specifically, the client receives a to-be-written address (usually LBA logical address) carrying to-be-written data in the data write IO request. Accordingly, after the client configures stripe information (one or more stripe units) for the data to be written, the address to be written may be divided according to the writing granularity (or the number of stripe units) of the data to be written, so as to obtain the respective corresponding logical address of each stripe unit. Further, the logical address and the physical address of each stripe unit and the data to be stored, which needs to be stored by the stripe unit, are sent to the corresponding object storage device. Correspondingly, after receiving the information, the object storage device may create a corresponding storage mapping relationship for the data to be stored, that is, a storage mapping relationship of the data to be stored. The mapping relationship between the logical address and the physical address when the data to be stored is stored (stripe unit) is included.

Or after the client correspondingly sends the stripe unit and the data write IO request to the object storage device, the object storage device may obtain the logical address corresponding to the stripe unit according to the address to be written in the data write IO request and the write granularity of the data to be written, that is, the logical address when the data to be stored in the stripe unit is stored. The application is not limited as to how to obtain the logical address of the stripe unit. Exemplarily, it is assumed that the client configures 4 stripe units, respectively stripe unit 1-stripe unit 4, for data to be written. Which are located in object storage device 1-object storage device 4, respectively. Accordingly, when the object storage device 4 receives the stripe unit 4 sent by the client, it carries its own physical address and data to be stored. Further, the logical address of the stripe unit 4, here 30-40, is determined in combination with the address to be written (assumed to be 0-40) in the received data write IO request. Finally, a storage mapping relationship is created for the data to be stored, i.e. a mapping relationship between the logical address and the physical address of the data to be stored in the stripe unit 4.

In an optional embodiment, after configuring the stripe information for the data to be written, the client may further create a corresponding storage mapping relationship for the data to be written according to the writing granularity of the data to be written, that is, the storage mapping relationship of the data to be written. The storage mapping relationship includes a mapping relationship between a logical address (i.e., a to-be-written address) and a physical address when the to-be-written data is stored.

Specifically, the data write IO request includes the LBA logical address of the data to be written, but the logical address is not related or associated with the physical address of the stripe unit where the data to be written is actually stored. Accordingly, after the data to be written is stored, the system cannot determine the physical address where the data is actually stored according to the logical address of the data to be written, and further cannot query the data to be written. Therefore, the target storage device also needs to establish and store a storage mapping relationship of the data to be written, the storage mapping relationship is specifically obtained by performing association binding on a logical address and a physical address when the data to be written is stored, and the storage mapping relationship is divided or customized according to the writing granularity.

For example, assume that the size of a data write IO request (specifically, the size of data to be written in the data write IO request) is 32K byte, the LBA logical address of the request including the data to be written is 0 to 200, and the write granularity of the data to be written is 8K. The stripe configured by the client for the data combination to be written comprises 4 stripe units, and the respective physical addresses are 0X00-0X 04. Accordingly, the object storage device creates a storage mapping relationship table as shown in table 1 below, which is used to reflect the storage mapping relationship of the data to be written.

TABLE 1

As can be seen from table 1 above, the stripe to be written into the data storage is composed of 4 stripe units, and the size of each stripe unit is 8K. Each stripe unit has a respective logical address and a physical address. Accordingly, after the object storage device receives the data read IO request, where the data read IO request includes a logical address (e.g., 0 to 50) of the data to be read, the object storage device may find a physical address (here, a physical address 0X01 of the stripe unit 1) corresponding to the logical address according to the storage mapping relationship table recorded in table 1, and then the data read from the stripe unit 1 at 0X01 is the data to be read.

In an optional embodiment, the client may further send the storage mapping relationship of the data to be written to each object storage device, so that the object storage device can know the mapping relationship between the logical address and the physical address of the stripe unit of the data storage. And after the target storage device receives the logical address of the data to be read carried in the data read IO request, obtaining the data to be read from the corresponding physical address according to the logical address.

By implementing the embodiment of the invention, the storage layout of the data is optimized by redefining the field of the DIF field, and compared with the prior art, the storage performance of the memory can be improved, and the utilization rate of the memory can be improved.

Next, in an application scenario of reading data, please refer to fig. 5, which is a schematic flow chart of another data processing method provided in the embodiment of the present invention. Please refer to fig. 5, which illustrates the method comprising the following steps:

step S502, the application software (specifically, the host running the application software) acquires the attribute information of the data to be read.

Step S504, the application software sends a data reading IO request to the client, wherein the data reading IO request comprises a DIF domain, and the DIF is used for bearing attribute information of data to be read. Accordingly, the client reads the data read IO request. The data read IO request may refer to the foregoing description related to the data write IO request, and is not described herein again.

Step S506, the client sends the data read IO request to the object storage device. Accordingly, the object storage device receives the data read IO request.

In the application, after the client receives the data read IO request, if the attribute information of the data to be read in the data read IO request includes the size of the pre-read data, when the size of the pre-read data is greater than the first threshold, the data read IO request can be issued to the target storage device in advance, so that the pre-read data can be read and obtained more quickly. The specific embodiments issued in advance herein are various, and the present application is not limited thereto. For example, the client may allocate more transmission resources for the data read IO request to increase the issuing rate of the data read IO request, so as to extract and issue the data read IO request to the object storage device, and the like. And when the size of the pre-read data is smaller than or equal to a first threshold, issuing a data read IO request to the object storage device according to the normal issuing rate.

The pre-read data comprises the data to be read, and can be understood as the total size of the data required to be read by the application software within a period of time. Understandably, there are limits to the size of the data read IO request that the application software sends each time, limited by the performance of the application software itself. When the pre-read data is large, the application software can complete reading of the pre-read data with corresponding size by issuing the data read IO request for multiple times. For example, the size of the pre-read data is 1M, and the size of the data read IO request is 1K (that is, the size of the data to be read requested in the data read IO request is 1K), the application software may complete reading of the pre-read data through 1000 data read IO requests.

Step S508, the object storage device reads the data to be read, and processes the data to be read according to the attribute information of the data to be read.

In step S508, there are several specific embodiments as follows: first, the attribute information of the data to be read includes a size of the pre-read data including the data to be read. If the size of the read-ahead data is larger than the first threshold, the object storage device may read the read-ahead data first, then send the read-ahead data to the client, and cache the read-ahead data to the client side. When subsequent application software issues the same data read IO request, corresponding data to be read in the pre-read data can be directly read from the client, and data reading time is saved.

Specifically, the data read IO request includes the LBA logical address of the pre-read data, and the target storage device may obtain, according to the logical address of the pre-read data, a physical address corresponding to the logical address of the pre-read data from a storage mapping relationship stored in the target storage device, and then obtain the pre-read data according to the physical address of the pre-read data. Further, the object storage device may send the read-ahead data to the client. Accordingly, the client may write the pre-read data into the memory, or the client may store the pre-read data according to the attribute information of the data to be read in the data read IO request, which is described in detail below.

And if the size of the pre-read data is smaller than or equal to a first threshold value, the object storage device can read the data to be read, and then the data to be read is stored according to the attribute information of the data to be read. Optionally, after the object storage device reads the data to be read, the data to be read may also be sent to other object storage devices, so that the other object storage devices may store the data to be read correspondingly according to the attribute information of the data to be read, which is not limited in this application. In this case, the data to be read may not be stored in the memory of the client.

Specifically, when the size of the pre-read data is smaller than or equal to the first threshold, the data read IO request includes the LBA logical address of the data to be read. Correspondingly, the object storage device obtains the physical address corresponding to the logical address of the data to be read from the stored storage mapping relation according to the logical address of the data to be read. And then reading the data to be read from the physical address. Further, the data to be read is sent to the upper layer application software (i.e. the host computer with the application software deployed).

Secondly, when the attribute information of the data to be read includes the read information of the address to be read corresponding to the data to be read, the object storage device may process the data to be read according to the attribute information of the data to be read.

Specifically, the attribute information of the data to be read includes read information of a address to be read corresponding to the data to be read, and the read information includes, but is not limited to, a read type or a read frequency. When the read information is used for indicating the frequency of data reading of the address to be read in unit time, and when the frequency is greater than or equal to the second threshold value, the object storage device can send the data to be read to the client side so as to cache the data in the memory of the client side. The data to be read can be directly read from the memory of the client side next time, and the data reading speed is improved. Accordingly, when the frequency is less than the second threshold, the object storage device may store the data to be read into the hard disk of the object storage device. Optionally, the object storage device may also send the data to be read to the client, so as to store the data in a hard disk of the client, and the like.

Optionally, when the read information is first read information, the first read information is used to indicate that the read frequency of the address to be read is in a third threshold range, or indicate that the read type of the address to be read is a third type (e.g., read often). Accordingly, the object storage device can write the data to be read into the second nonvolatile cache of the object storage device, so that the cached data to be read can be directly read from the second nonvolatile cache next time.

When the read information is second read information, the second read information is used to indicate that the read frequency of the address to be read corresponding to the data to be read is in a fourth threshold range, or indicate that the read type of the address to be read is a fourth type (for example, that multiple times of reading is to be performed). The subject storage device may store the data to be read into a third non-volatile cache of the subject storage device accordingly. Optionally, the object storage device may further set a corresponding expiration time in the third non-volatile cache, so as to facilitate clearing data stored in the third non-volatile cache, whose time duration exceeds the expiration time. The expiration time may be specifically set by a user or a system, for example, 1 day.

When the read information is third read information, the third read information is used to indicate that the read frequency of the address to be read corresponding to the data to be read is within a fifth threshold range, or indicate that the read type of the address to be read is a fifth type (for example, few reads or read-only order, etc.). Accordingly, the target storage device may write or store the data to be read into a hard disk of the target storage device, and/or the target storage device may clear the data stored corresponding to the preset storage address. Specifically, the target storage device may determine a preset storage address (or a nonvolatile memory) where the data to be cleared indicated by the target storage device is located according to the data write IO request, where the preset storage address may be an LBA logical address. Correspondingly, the object storage device can clear the data correspondingly stored by the preset storage address so as to release the data correspondingly stored in the nonvolatile memory; or directly release all data stored in the non-volatile memory.

The third threshold range, the fourth threshold range and the fifth threshold range can be set by a user or a system according to actual requirements in a self-defined mode, the lower limit value of the third threshold range is larger than or equal to the upper limit value of the fourth threshold range, and the upper limit value of the fourth threshold range is larger than or equal to the lower limit value of the fifth threshold range. The read-write speed of the nonvolatile cache is greater than that of the nonvolatile hard disk. The first nonvolatile cache and the second nonvolatile cache related to the present application may be the same cache or different caches deployed in the object storage device, and may be determined specifically according to actual needs, without limitation.

In other words, the target storage device may store or write data to be read corresponding to addresses to be read of different reading frequencies or reading types into different nonvolatile memories. For example, when the read type of the address to be read corresponding to the data to be read is read-often, the target storage device may write the data to be read into the cache. When the read type of the address to be read corresponding to the data to be read is to be read for multiple times (sequential read), the object storage can write the data to be read into the same memory (such as a memory, a cache or a disk) so as to improve the performance and efficiency of data reading. Optionally, an expiration period (i.e., an expiration period) may also be set in the memory to limit the storage period for storing data in the memory. When the read type of the data to be read is read once, the object storage device may directly write the data to be read into the disk without caching the data to be read. When the read type of the address to be read corresponding to the data to be read is a type for indicating to release the cache, the target storage device may obtain a preset storage address (e.g., a logical address) of the data to be cleared from the data write IO request, and clear the data stored corresponding to the preset storage address to release the cache, and so on.

By implementing the embodiment of the invention, the storage layout of the data is optimized by redefining the field of the DIF field, and compared with the prior art, the storage performance of the storage system can be improved, and the working efficiency of the storage system is improved.

Fig. 6 is a schematic structural diagram of a data processing apparatus according to an embodiment of the present invention. The apparatus 600 shown in fig. 6 includes a communication module 602 and a processing module 604. Wherein the content of the first and second substances,

in one possible implementation, the data processing apparatus 600 is a host. In particular, the processing module 604 may be configured to control and manage the operations of the data processing apparatus 600. Illustratively, the processing module 604 is configured to support the host computer in performing step S202 in fig. 2, step S402 in fig. 4, step S502 in fig. 5, and/or in performing other steps of the techniques described herein. The communication module 602 is used to support communication of the data processing apparatus 600 with other devices, e.g., the communication module 602 is used to host execution of step S204 in fig. 2, step S404 in fig. 4, step S504 in fig. 5, and/or to perform other steps of the techniques described herein.

In another possible implementation, the data processing apparatus 600 is an object storage device. In particular, the processing module 604 may be configured to control and manage the operations of the data processing apparatus 600. Illustratively, the processing module 604 is configured to enable the object storage device to perform steps S206 and S208 in fig. 2, step S410 in fig. 4, step S508 in fig. 5, and/or to perform other steps of the techniques described herein. The communication module 602 is used to support communication between the data processing apparatus 600 and other devices, for example, the communication module 602 is used for the object storage device to receive a data read IO request or a data write IO request sent by a client, and/or to perform other steps of the technology described herein.

In another possible implementation, the data processing device 600 is a client. In particular, the processing module 604 may be configured to control and manage the operations of the data processing apparatus 600. Illustratively, the processing module 604 is used to support the client in performing step S406 in fig. 4, and/or in performing other steps of the techniques described herein. The communication module 602 is used to support communication of the data processing apparatus 600 with other devices, e.g., the communication module 602 is used to perform step S408 in fig. 4, step S506 in fig. 5, and/or to perform other steps of the techniques described herein.

Optionally, the data processing apparatus 600 may further comprise a storage module 606 for storing program codes and data of the data processing apparatus 600.

The processing module 604 may be a Processor or a controller, such as a Central Processing Unit (CPU), a general purpose Processor, a Digital Signal Processor (DSP), an Application-Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor may also be a combination of computing functions, e.g., comprising one or more microprocessors, DSPs, and microprocessors, among others. The communication module 602 may be a communication interface, a transceiver circuit, etc., wherein the communication interface is a generic term and may include one or more interfaces, such as an interface between the communication module and a processing module, an interface between an object storage device and other devices (e.g., a host or a client), etc. The storage module 606 may be a memory or other service or module for providing storage functionality.

When the processing module 604 is a processor, the communication module 602 is a communication interface, and the storage module 606 is a memory, the data processing apparatus according to the embodiment of the present invention may be the data processing device shown in fig. 7. The processing module 604, the communication module 602, and the storage module 606 may also be implemented by software.

Referring to fig. 7, the data processing apparatus 700 includes one or more processors 701, a communication interface 701, and a memory 703. Optionally, the data processing device 700 may also include a bus 704. The communication interface 703, the processor 702, and the memory 701 may be connected to each other by a bus 704; the bus 704 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus 704 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 7, but this is not intended to represent only one bus or type of bus. Wherein:

processor 701 may be comprised of one or more general-purpose processors, such as a Central Processing Unit (CPU). The processor 701 may be configured to run a program of processing functions in the associated program code. That is, the processor 701 executing the program code may implement the functions of the processing module. For the processing module, reference may be made to the related explanations in the foregoing embodiments.

In one possible embodiment, when the data processing apparatus 700 is a host, the processor 701 of the host is configured to run related program codes to implement the functions of the processing modules described above in this application. Or implement step S202 in fig. 2, step S402 in fig. 4, step S502 in fig. 5, and/or other steps for performing the techniques described herein, etc., as described above, and is not limited in this application.

In another possible implementation manner, when the data processing device 700 is an object storage device, the processor 701 of the object storage device is configured to run related program codes to implement the functions of the processing module described above in this application. Or implement steps S206 and S208 in fig. 2, step S410 in fig. 4, step S508 in fig. 5, and/or other steps for performing the techniques described herein, etc., as described above in this application, which is not limited to this.

In another possible embodiment, when the data processing device 700 is a client, the processor 701 of the client is configured to run related program codes to implement the functions of the processing module described above in this application. Or implement step S406 in fig. 4 described above in this application, and/or other steps for performing the techniques described herein, etc., which are not limited in this application. The communication interface 602 may be a wired interface (e.g., an ethernet interface) or a wireless interface (e.g., a cellular network interface or using a wireless local area network interface) for communicating with other modules/devices. For example, in the embodiment of the present application, the communication interface 602 may be specifically configured to obtain attribute information of data to be operated, for example, attribute information of data to be read or data to be written.

The Memory 603 may include a Volatile Memory (Volatile Memory), such as a Random Access Memory (RAM); the Memory may also include a Non-volatile Memory (Non-volatile Memory), such as a Read-Only Memory (ROM), a Flash Memory (Flash Memory), a Hard Disk (Hard Disk Drive, HDD), or a Solid-State Drive (SSD); the memory 603 may also comprise a combination of memories of the kind described above. The memory 603 may be used to store a set of program codes for facilitating the processor 601 to call the program codes stored in the memory 603 to implement the functions of the communication module and/or the processing module involved in the embodiments of the present invention.

It should be noted that fig. 7 is only one possible implementation manner of the embodiment of the present application, and in practical applications, the data processing apparatus may further include more or less components, which is not limited herein. For the content that is not shown or described in the embodiment of the present application, reference may be made to the related explanation in any of the foregoing embodiments of fig. 2 to fig. 5, which is not described herein again.

Embodiments of the present invention further provide a computer non-transitory storage medium having instructions stored therein, where when the computer non-transitory storage medium runs on a processor, the method flow shown in any one of fig. 2 to 5 is implemented.

Embodiments of the present invention further provide a computer program product, where when the computer program product runs on a processor, the method flow shown in any one of fig. 2 to 5 is implemented.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware or in software executed by a processor. The software instructions may be comprised of corresponding software modules that may be stored in Random Access Memory (RAM), flash Memory, Read Only Memory (ROM), Erasable Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), registers, a hard disk, a removable disk, a compact disc Read Only Memory (CD-ROM), or any other form of storage medium known in the art. An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. Of course, the storage medium may also be integral to the processor. The processor and the storage medium may reside in an ASIC. Additionally, the ASIC may reside in a computing device. Of course, the processor and the storage medium may reside as discrete components in a computing device.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. And the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.

Claims

1. A method of data processing, the method comprising:

the method comprises the steps that an object storage device receives a data read IO request, wherein the data read IO request comprises a Data Integrity Field (DIF), and the DIF is used for bearing attribute information of data to be read;

and processing the data to be read according to the attribute information of the data to be read.

2. The method according to claim 1, wherein the attribute information of the data to be read comprises a read granularity of the data to be read, which is used for indicating a size of the pre-read data including the data to be read;

the processing the data to be read according to the attribute information of the data to be read comprises:

when the size of the pre-read data is larger than or equal to a first threshold value, reading the pre-read data including the data to be read, and sending the pre-read data to a client to be cached in a memory of the client; or,

and when the size of the preset data is smaller than a first threshold value, reading the data to be read, and sending the data to be read to a host.

3. The method according to claim 1 or 2, wherein the attribute information of the data to be read comprises read information of a address to be read corresponding to the data to be read, the read information is used for indicating a frequency of data reading of the address to be read in a unit time,

when the read information is first read information, writing the data to be read into a cache of the object storage device; or,

when the read information is second read information, writing the data to be read into a cache of the object storage device, and setting an expiration time length in the cache of the object storage device to clear the data with the storage time length exceeding the expiration time length in the cache;

wherein the frequency indicated by the first read information is greater than the frequency indicated by the second read information.

4. A method of data processing, the method comprising:

the method comprises the steps that an object storage device receives a data write IO request, wherein the data write IO request comprises a Data Integrity Field (DIF), and the DIF is used for bearing attribute information of data to be written;

and processing the data to be written according to the attribute information of the data to be written.

5. The method according to claim 4, wherein the attribute information of the data to be written comprises writing information of a address to be written corresponding to the data to be written, the writing information is used for indicating a frequency of data writing at the address to be written in a unit time,

the processing the data to be written according to the attribute information of the data to be written comprises:

when the write-in information is first write-in information, the object storage device stores the data to be written into a cache of the object storage device; or,

when the write-in information is second write-in information, the object storage device stores the data to be written into a hard disk of the object storage device;

wherein the frequency indicated by the first written information is greater than the frequency indicated by the second written information.

6. The method according to claim 4 or 5, wherein the attribute information of the data to be written comprises a writing granularity of the data to be written, and is used for indicating a storage granularity adopted when the data to be written is stored; the method further comprises the following steps:

the client receives the data write IO request, and configures corresponding stripe information for the data to be written according to the attribute information of the data to be written, wherein the stripe information comprises a stripe unit used when the data to be written is stored and a physical address of the stripe unit;

the client sends a stripe unit carrying data to be stored and the physical address to the object storage device according to the stripe information, wherein the data to be stored is data to be written in the data;

the object storage device receives the stripe unit which is sent by the client and carries the data to be stored and the physical address;

and the object storage equipment stores the data to be stored into a hard disk of the object storage equipment according to the physical address of the strip unit.

7. The method of claim 6, further comprising: and the object storage equipment creates a storage mapping relation of the data to be stored according to the writing granularity of the data to be written, wherein the storage mapping relation comprises a mapping relation between a logical address and the physical address when the data to be stored is stored, and the logical address is associated with the address to be written.

8. A method of data processing, the method comprising:

the method comprises the steps that a host computer obtains attribute information of data to be operated and generates a data operation request according to the attribute information of the data to be operated;

and sending the data operation request to a client, wherein the data operation request comprises a Data Integrity Field (DIF), and the DIF is used for bearing attribute information of the data to be operated.

9. The method according to claim 8, wherein when the data operation request is a data read IO request, the attribute information of the data to be operated includes a read granularity of the data to be read and/or read information of a address to be read corresponding to the data to be read;

the reading granularity of the data to be read is used for indicating the size of the pre-read data including the data to be read;

and the reading information of the address to be read is used for indicating the frequency of data reading of the address to be read in unit time.

10. The method according to claim 8, wherein when the data operation request is a data write IO request, the attribute information of the data to be operated includes write granularity of the data to be written and/or write information of a address to be written corresponding to the data to be written;

the writing granularity of the data to be written is used for indicating the storage granularity adopted when the data to be written is stored;

the writing information of the address to be written is used for indicating the frequency of data writing of the address to be written in unit time.

11. A data processing device is characterized by comprising a communication module and a processing module; wherein,

the communication module is configured to receive a data read IO request, where the data read IO request includes a data integrity field DIF, and the DIF is used to carry attribute information of data to be read;

and the processing module is used for processing the data to be read according to the attribute information of the data to be read.

12. A data processing device is characterized by comprising a communication module and a processing module; wherein,

the communication module is configured to receive a data write IO request, where the data write IO request includes a data integrity field DIF, and the DIF is used to carry attribute information of data to be written;

and the processing module is used for processing the data to be written according to the attribute information of the data to be written.

13. A data processing apparatus, comprising a communication unit and a processing unit; wherein,

14. An object storage device comprising a processor, a memory; the processor and the memory establishing communication; wherein the memory is to store instructions; the processor is configured to call instructions in the memory to perform the method of any of claims 1-3 above.

15. An object storage device comprising a processor, a memory; the processor and the memory establishing communication; wherein the memory is to store instructions; the processor is used for calling the instruction in the memory and executing the method of any one of the preceding claims 4-7.

16. A host, comprising a processor, a memory; the processor and the memory establishing communication; wherein the memory is to store instructions; the processor is configured to call instructions in the memory to perform the method of any of claims 8-10 above.

17. A computer non-transitory storage medium storing a computer program, wherein the computer program when executed by a computing device implements the method of any one of claims 1 to 3.

18. A computer non-transitory storage medium storing a computer program, wherein the computer program when executed by a computing device implements the method of any of claims 4 to 7.

19. A computer non-transitory storage medium storing a computer program, wherein the computer program when executed by a computing device implements the method of any of claims 8 to 10.