CN115827508A

CN115827508A - Data processing method, system, equipment and storage medium

Info

Publication number: CN115827508A
Application number: CN202310026962.3A
Authority: CN
Inventors: 刘杰; 侯斌; 安祥文; 张柯
Original assignee: Suzhou Inspur Intelligent Technology Co Ltd
Current assignee: Suzhou Inspur Intelligent Technology Co Ltd
Priority date: 2023-01-09
Filing date: 2023-01-09
Publication date: 2023-03-21
Anticipated expiration: 2043-01-09
Also published as: CN115827508B

Abstract

The invention discloses a data processing method, which comprises the following steps of: responding to the received read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period; and responding to the read times not less than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO. And screening the data to be read corresponding to the read IO from all the residual data according to the address length in the read IO, returning the data to the upper layer, and placing the rest data into the cache of the bluestore layer. The invention also discloses a system, a device and a storage medium. The scheme provided by the invention can improve the small IO sequential reading performance, reduce frequent accesses to the disk, prolong the service life and reduce the power consumption, reduce the hardware cost and improve the competitiveness of the product.

Description

Data processing method, system, equipment and storage medium

Technical Field

The present invention relates to the field of storage, and in particular, to a data processing method, system, device, and storage medium.

Background

With the continuous development of information technology, data is gradually valued as a precious resource, and how to quickly process data resources and obtain expected results becomes one of the key problems of resource-to-asset transition. Data are generated by various activities of people in work and life, useful information can be obtained by collecting the data and analyzing and processing the data, and the conversion from resources to assets is realized, so that the rapid development of big data and high-performance calculation is catalyzed. Data storage, one of the core elements of data resources, has also been in the period of rapid development. The traditional network storage system adopts a centralized storage server to store all data, the storage server becomes the bottleneck of the system performance, is also the focus of reliability and safety, and cannot meet the requirement of large-scale storage application. The distributed network storage system adopts an expandable system structure, not only improves the reliability, the availability and the access efficiency of the system, but also is easy to expand, thereby being accepted by more and more enterprise units. Distributed storage systems typically consist of 3 to N nodes to provide high performance, mass data storage.

In the use process of the block scenes, the distributed storage is realized, and the data access mode of some application scenes is small IO (input/output) sequential reading. The sequential small IO reading operation of the same object is equivalent to changing parallel processing into serial processing, the concurrency is reduced, and meanwhile, the time consumed by the reading request is more, so the performance of sequential small IO reading is not high, and the performance advantage of distributed storage cannot be fully exerted. The industry generally addresses such problems by adding non-volatile cache or higher performance CPUs, which requires additional hardware and upgrades to CPU performance, as well as increases cost.

Disclosure of Invention

In view of the above, in order to overcome at least one aspect of the above problems, an embodiment of the present invention provides a data processing method, including the following steps performed at a bluestore layer:

in response to receiving a read IO, acquiring the number of times that a data object corresponding to the read IO is read within a preset time period;

responding to the read times not less than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO;

and screening the data to be read corresponding to the read IO from all the residual data according to the address length in the read IO, returning the data to the upper layer, and placing the rest data into the cache of the bluestore layer.

In some embodiments, further comprising:

an access counter for recording the number of times read and a time stamp recorder for recording the read time are added to the metadata of each data object.

In some embodiments, in response to receiving a read IO, obtaining the number of times that a data object corresponding to the read IO is read within a preset time period, further includes:

responding to the received read IO, and judging whether the data to be read can be hit in the cache;

and directly returning the data to be read to an upper layer in response to the data to be read being hit in the cache.

In some embodiments, further comprising:

and in response to the data to be read not being hit in the cache, reading an access counter and a time stamp recorder in the metadata corresponding to the corresponding data object.

In some embodiments, in response to that the number of times of reading is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further includes:

and in response to the access counter being 0, setting the access counter in the corresponding metadata to be 1 and not starting pre-reading.

In some embodiments, further comprising:

and directly reading the data to be read from the corresponding data object and returning to the upper layer.

In some embodiments, further comprising:

and updating the timestamp recorder in the corresponding metadata according to the current time.

in response to the access counter not being 0, obtaining a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;

and setting the access counter of the corresponding data object to be 1 and not starting pre-reading in response to the difference value of the two values being larger than the preset time period.

In some embodiments, further comprising:

and directly reading the data to be read from the corresponding data object and returning to an upper layer.

In some embodiments, further comprising:

and in response to the difference value being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.

In some embodiments, further comprising:

judging whether the access counter of the corresponding data object is not less than a threshold value;

and responding to the condition that the data is not less than the threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO.

In some embodiments, further comprising:

and responding to the data object smaller than the threshold value, not starting pre-reading, directly reading the data to be read from the corresponding data object and returning to an upper layer.

In some embodiments, further comprising:

acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;

determining a corresponding storage pool according to the storage volume name and acquiring the number of PGs in the storage pool;

and determining the number of the next data object according to the number of the objects and the number of the PGs, and further determining the name of the next data object.

In some embodiments, further comprising:

and reading all data in the next data object and putting the data into the buffer memory of the bluestore layer.

In some embodiments, further comprising:

acquiring the size of preset data read in advance each time;

judging whether all the data left in the corresponding data object is smaller than the size of the pre-read data each time;

and in response to that all the data left in the corresponding data object is smaller than the size of the data read in advance each time, reading the data with the corresponding size from the next data object so as to enable the data read in advance to reach the size of the data read in advance each time.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a data processing system, including:

the acquisition module is configured to respond to the received read IO and acquire the number of times that a data object corresponding to the read IO is read within a preset time period;

the pre-reading module is configured to respond that the read times are not less than a threshold value, start pre-reading and read all the remaining data in the corresponding data object according to the address offset in the read IO;

and the return module is configured to screen the data to be read corresponding to the read IO from all the remaining data according to the address length in the read IO, return to an upper layer and place the remaining data into the cache of the bluestore layer.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:

at least one processor; and

a memory storing a computer program operable on the processor, the processor executing the program to perform the steps of any of the data processing methods as described above.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any of the data processing methods described above.

The invention has one of the following beneficial technical effects: according to the scheme provided by the invention, the range of reading data of the object which is judged to be hot data is expanded through the number of times of reading the same object within the bluestore layer identification threshold time, the read data is put into a memory cache, data pre-reading is realized, multiple times of small IO reading is converted into large IO reading, the data of the next object can be intelligently identified and pre-read into the memory, the subsequent sequential reading of the object is enabled to directly read data from the memory, the data reading from a disk at each time is avoided, the time delay of each reading is reduced, and the purpose of improving the performance of the small IO sequential reading is achieved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

FIG. 1 is a schematic flow chart of a data processing method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a data processing system according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a computer device provided in an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention are described in further detail with reference to the accompanying drawings.

It should be noted that all expressions using "first" and "second" in the embodiments of the present invention are used for distinguishing two entities with the same name but different names or different parameters, and it should be noted that "first" and "second" are merely for convenience of description and should not be construed as limitations of the embodiments of the present invention, and they are not described in any more detail in the following embodiments.

In an embodiment of the present invention, distributed block storage is an extensible storage architecture. The storage architecture can realize cross-device data distribution and can share load by a plurality of servers. In physical and virtual machine applications, block storage may be used as a long-term storage device, typically including high-level services such as backup and snapshot.

Object: the storage data is divided into a plurality of objects, each object has an object id, the size of each object can be set, 4MB is defaulted, and the objects can be regarded as the minimum storage unit of the distributed storage.

OSD: the English is called Object Storage Device, and its main function is to store data, copy data, balance data, recover data, etc. and to perform heartbeat check with other OSD. Generally, a hard disk corresponds to an OSD, and the OSD manages the storage of the hard disk, although a partition may also be an OSD.

Blue store: distributed storage is an object storage engine that manages the underlying data.

Onod: and recording a data structure of object metadata information, wherein each object corresponds to one onode.

Small IO: IO reads smaller than a preset size (e.g., 128 KB) are defined as small IO reads.

PG: the PG is an aggregate of some objects, and is a basic unit forming a storage pool, and various characteristics of the storage pool, such as data backup strategies of multiple copies and erasure codes, are finally realized by depending on the PG.

Shard (Shard): the OSD set distributed by one PG is called a board set, each OSD in the set is called a board, the OSDs have respective numbers, and the numbers of the OSDs in the board set are different from 0 according to different data backup strategies. For example, in a three-copy backup strategy, a shard set contains 3 osds, and the shard numbers are 0, 1 and 2.

And (3) rolling: the data in the block storage is stored in the volume in the form of blocks, and the volume is hung on the node. It can provide greater storage capacity for applications and greater reliability and performance. The volume formed by these blocks will map into the operating system and be controlled by the file system layer.

Object name in Block store: the size of an object in a block storage is generally 4MB, a volume is divided into a plurality of objects according to 4MB, the objects are numbered from 0 according to the internal offset of the volume, and the object name is composed of two parts: volume name and object number.

According to an aspect of the present invention, an embodiment of the present invention proposes a data processing method, as shown in fig. 1, which may include performing the following steps at a bluestore layer:

s1, responding to the received read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;

s2, responding to the fact that the read times are not smaller than a threshold value, starting pre-reading, and reading all the remaining data in the corresponding data object according to the address offset in the read IO;

and S3, screening the data to be read corresponding to the read IO from all the residual data according to the address length in the read IO, returning the data to the upper layer, and placing the rest data into the cache of the bluestore layer.

The scheme provided by the invention can improve the small IO sequential reading performance, reduce frequent accesses to the disk, prolong the service life and reduce the power consumption, reduce the hardware cost and improve the competitiveness of the product.

In some embodiments, further comprising:

Specifically, an access counter and a timestamp recorder may be added to a data structure for recording metadata information of an object, so as to count the number of times and time that the object is read and accessed by a small IO within a period of time.

Specifically, after receiving a read request, bluestore tries to read data from a memory cache, and if the Bluestore hits the memory cache, the Bluestore directly returns the data to an upper layer; if there is no hit, the data is read from the disk.

In some embodiments, further comprising:

In some embodiments, in response to the number of times of being read being not less than a threshold, starting pre-reading and reading all data remaining in the corresponding data object according to an address offset in the read IO, further including:

In some embodiments, further comprising:

and responding to the difference value of the two data objects being less than the preset time period, and adding 1 to the access counter of the corresponding data object.

In some embodiments, further comprising:

Specifically, in a preset time period (e.g., 3 seconds), when the accumulated number of times that the same object is read from the disk by the small IO reaches a threshold (e.g., 3 times), the read-ahead is started.

When the data object is read for the first time, the counter value is 0 at this time, so 1 is added to the corresponding counter of the object and the current timestamp is updated, at which time no pre-read operation is performed.

If the data object is not read for the first time, and the counter is not 0 at the moment, comparing and recording the time stamp with the current time, if the preset time period is exceeded, setting the counter to be 1, updating the current time stamp, and not executing the pre-reading operation; and if the time does not exceed the preset time period of seconds, adding 1 to the counter, updating the current timestamp, simultaneously judging whether the value of the counter reaches a threshold value, if so, starting pre-reading, and otherwise, not starting the pre-reading.

In some embodiments, further comprising:

and reading all data in the next data object and putting the data into the buffer of the bluestore layer.

Specifically, after the pre-reading is started, based on the read position (offset) of the read IO, the read IO read length is extended from length to the remaining total data of the object.

And after the pre-reading is started, the next object to be read next on the PG can be automatically identified according to the current object name, and all data of the object can be pre-read on the shrard. The method comprises the following specific steps: and analyzing the object name to obtain the volume name and the object number, wherein the current object number plus the PG number of the storage pool is the number of the next object on the board, and thus the object name of the next object is obtained. And pre-reading the data of the whole object according to the acquired object name.

In some embodiments, further comprising:

acquiring the size of preset data read in advance each time;

Specifically, in some scenarios, all data of the next object may not be pre-read, for example, the size of the pre-read data each time may be set, and if all data remaining in the data object where the data to be read is located is smaller than the size of the pre-read data each time, the data of the corresponding size is read from the next data object so that the pre-read data reaches the size of the pre-read data each time.

In some embodiments, after Bluestore locally reads the residual data of the object, returning the length-length data to the upper layer to ensure data consistency, and putting the residual data into the Bluestore cache. And when the Bluestore cached data reaches a certain threshold value, performing trim on the data which is added into the cache earliest so as to release the cache space.

Note that the read ahead of the Bluestore layer is not perceived by the upper layers.

According to the scheme provided by the invention, the range of reading data of the object which is judged to be hot data is expanded through the number of times of reading the same object within the bluestore layer identification threshold time, the read data is put into a memory cache, data pre-reading is realized, multiple times of small IO reading is converted into large IO reading, the data of the next object can be intelligently identified and pre-read into the memory, the subsequent sequential reading of the object is enabled to directly read data from the memory, the data reading from a disk at each time is avoided, the time delay of each reading is reduced, and the purpose of improving the performance of the small IO sequential reading is achieved.

Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a data processing system 400, as shown in fig. 2, including:

the obtaining module 401 is configured to, in response to receiving a read IO, obtain the number of times that a data object corresponding to the read IO is read within a preset time period;

a pre-reading module 402, configured to start pre-reading and read all remaining data in the corresponding data object according to the address offset in the read IO in response to that the number of times of reading is not less than a threshold;

a returning module 403, configured to filter, according to the address length in the IO, data to be read corresponding to the IO from all the remaining data, return to an upper layer, and place the remaining data in the cache of the bluestore layer.

In some embodiments, further comprising a metadata module configured to:

In some embodiments, the acquisition module 401 is further configured to:

In some embodiments, the pre-read module 402 is further configured to:

In some embodiments, the read-ahead module 402 is further configured to:

In some embodiments, the pre-read module 402 is further configured to:

determining a corresponding storage pool according to the storage volume name and acquiring the number of PG in the storage pool;

In some embodiments, the pre-read module 402 is further configured to:

acquiring the size of preset data read in advance each time;

According to the scheme provided by the invention, the range of reading data is expanded for the object which is judged to be hot data through reading the same object within the threshold value identification time of the bluestore layer, the read data is placed into the memory cache, the data pre-reading is realized, multiple times of small IO reading are converted into large IO reading, the data of the next object can be intelligently identified and pre-read into the memory, the subsequent sequential reading of the object is enabled to directly read data from the memory, the data is prevented from being read from a disk every time, the time delay of each reading is reduced, and the purpose of improving the performance of the sequential reading of the small IO is achieved.

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, comprising:

at least one processor 520; and

a memory 510, the memory 510 storing a computer program 511 executable on the processor, the processor 520 executing the program to perform the steps of:

responding to the received read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;

and responding to the read times not less than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO.

In some embodiments, further comprising:

an access counter for recording the number of times read and a time stamp recorder for recording the time of reading are added to the metadata of each data object.

In some embodiments, further comprising:

acquiring the size of preset data read in advance each time;

Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, the computer-readable storage medium 601 stores a computer program 610, and the computer program 610 performs the following steps when executed by a processor:

In some embodiments, further comprising:

and in response to the data to be read is not hit in the cache, reading an access counter and a time stamp recorder in the metadata corresponding to the corresponding data object.

In some embodiments, further comprising:

and responding to the difference value of the two data objects being larger than the preset time period, setting the access counter of the corresponding data object to be 1, and not starting pre-reading.

In some embodiments, further comprising:

and responding to the condition that the read address is not less than the threshold value, starting pre-reading and reading all the data left in the corresponding data object according to the address offset in the read IO.

In some embodiments, further comprising:

acquiring the size of preset data read in advance each time;

Finally, it should be noted that, as will be understood by those skilled in the art, all or part of the processes of the methods of the above embodiments may be implemented by a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above.

Further, it should be appreciated that the computer-readable storage media (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.

Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.

The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the present disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments of the invention may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.

It should be understood that, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.

The numbers of the embodiments disclosed in the embodiments of the present invention are merely for description, and do not represent the merits of the embodiments.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, of embodiments of the invention is limited to these examples; within the idea of an embodiment of the invention, also technical features in the above embodiment or in different embodiments may be combined and there are many other variations of the different aspects of the embodiments of the invention as described above, which are not provided in detail for the sake of brevity. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of the embodiments of the present invention are intended to be included within the scope of the embodiments of the present invention.

Claims

1. A data processing method, comprising performing the following steps at a bluestore layer:

2. The method of claim 1, further comprising:

3. The method of claim 2, wherein in response to receiving a read IO, obtaining a number of times that a data object corresponding to the read IO is read within a preset time period, further comprising:

responding to the received read IO, and judging whether the data to be read can be hit in the cache or not;

4. The method of claim 3, further comprising:

5. The method of claim 4, wherein in response to the number of times read being not less than a threshold, starting a pre-read and reading all data remaining in the corresponding data object according to an address offset in the read IO, further comprising:

6. The method of claim 5, further comprising:

7. The method of claim 5, further comprising:

8. The method of claim 4, wherein in response to the number of times read being not less than a threshold, initiating a pre-read and reading all data remaining in the corresponding data object according to an address offset in the read IO, further comprising:

9. The method of claim 8, further comprising:

10. The method of claim 8, further comprising:

11. The method of claim 8, further comprising:

12. The method of claim 11, further comprising:

13. The method of claim 12, further comprising:

and responding to the data to be read smaller than the threshold value, not starting pre-reading, directly reading the data to be read from the corresponding data object, and returning to an upper layer.

14. The method of claim 11 or 13, further comprising:

15. The method of claim 1, wherein in response to the number of times read being not less than a threshold, starting a pre-read and reading all data remaining in the corresponding data object according to an address offset in the read IO, further comprising:

and determining the number of the next data object according to the number of the objects and the number of the PG, and further determining the name of the next data object.

16. The method of claim 15, further comprising:

17. The method of claim 15, further comprising:

acquiring the size of preset data read in advance each time;

18. A data processing system, comprising:

and the return module is configured to screen the data to be read corresponding to the read IO from all the remaining data according to the address length in the read IO, return the data to the upper layer, and place the remaining data into the cache of the bluestore layer.

19. A computer device, comprising:

at least one processor; and

memory storing a computer program operable on the processor, wherein the processor executes the program to perform the steps of the method according to any of claims 1-17.

20. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, is adapted to carry out the steps of the method according to any one of claims 1-17.