CN115827508B - Data processing method, system, equipment and storage medium - Google Patents
Data processing method, system, equipment and storage medium Download PDFInfo
- Publication number
- CN115827508B CN115827508B CN202310026962.3A CN202310026962A CN115827508B CN 115827508 B CN115827508 B CN 115827508B CN 202310026962 A CN202310026962 A CN 202310026962A CN 115827508 B CN115827508 B CN 115827508B
- Authority
- CN
- China
- Prior art keywords
- read
- data
- reading
- data object
- response
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003672 processing method Methods 0.000 title abstract description 9
- 238000012216 screening Methods 0.000 claims abstract description 6
- 230000004044 response Effects 0.000 claims description 64
- 238000000034 method Methods 0.000 claims description 27
- 238000004590 computer program Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 238000004064 recycling Methods 0.000 claims 1
- 230000000694 effects Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 description 3
- 239000000872 buffer Substances 0.000 description 3
- 238000013500 data storage Methods 0.000 description 3
- 238000011161 development Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 229910000366 copper(II) sulfate Inorganic materials 0.000 description 2
- JZCCFEFSEZPSOG-UHFFFAOYSA-L copper(II) sulfate pentahydrate Chemical compound O.O.O.O.O.[Cu+2].[O-]S([O-])(=O)=O JZCCFEFSEZPSOG-UHFFFAOYSA-L 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- DMIUGJLERMOBNT-UHFFFAOYSA-N 4-amino-n-(3-methoxypyrazin-2-yl)benzenesulfonamide;5-[(3,4,5-trimethoxyphenyl)methyl]pyrimidine-2,4-diamine Chemical group COC1=NC=CN=C1NS(=O)(=O)C1=CC=C(N)C=C1.COC1=C(OC)C(OC)=CC(CC=2C(=NC(N)=NC=2)N)=C1 DMIUGJLERMOBNT-UHFFFAOYSA-N 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a data processing method, which comprises the following steps of: responding to receiving a read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period; and responding to the read times not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO. And screening the data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, returning the data to be read to an upper layer, and putting the rest of the data into a cache of the blue layer. The invention also discloses a system, equipment and a storage medium. The scheme provided by the invention can improve the sequential reading performance of small IO, reduce frequent accesses to the disk, achieve the effects of prolonging the service life and reducing the power consumption, reduce the hardware cost and improve the competitiveness of products.
Description
Technical Field
The present invention relates to the field of storage, and in particular, to a data processing method, system, device, and storage medium.
Background
With the continued development of information technology, data is becoming increasingly important as a precious resource, and how to quickly process data resources and obtain desired results is one of the key issues in the transition from resources to assets. Various activities of people in work and life can generate data, useful information can be obtained by collecting the data and analyzing and processing the data, and conversion from resources to assets is realized, so that the high-speed development of big data and high-performance calculation is catalyzed. Data storage has also emerged as one of the core elements of data resources for a period of rapid development. The traditional network storage system adopts a centralized storage server to store all data, and the storage server becomes a bottleneck of system performance, is also a focus of reliability and safety, and cannot meet the requirements of large-scale storage application. The distributed network storage system adopts an extensible system structure, so that the reliability, availability and access efficiency of the system are improved, and the system is easy to extend, and is accepted by more and more enterprises. Distributed storage systems typically have 3 to N nodes to provide high performance, mass data storage.
In the using process of the distributed storage block scene, the data access mode of some application scenes is small IO sequential reading. The sequential small IO read operation of the same object is equivalent to changing parallel processing into serial processing, the concurrency is reduced, and meanwhile, the time consumed by the read request when the disk is read is more, so that the performance of sequential small IO read is not high, and the performance advantage of distributed storage cannot be fully exerted. Industry typically addresses such issues by adding nonvolatile caches or higher performance CPUs, which require additional hardware and upgrade CPU performance, which in turn increases costs.
Disclosure of Invention
In view of this, in order to overcome at least one aspect of the above-mentioned problems, an embodiment of the present invention proposes a data processing method, including performing, at a blue layer, the following steps:
responding to receiving a read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;
responding to the read times not smaller than a threshold value, starting pre-reading and reading all the rest data in the corresponding data object according to the address offset in the read IO;
and screening the data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, returning the data to be read to an upper layer, and putting the rest of the data into a cache of the blue layer.
In some embodiments, further comprising:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
In some embodiments, in response to receiving a read IO, acquiring a number of times that a data object corresponding to the read IO is read in a preset time period, further includes:
in response to receiving the read IO, judging whether the data to be read can be hit in the cache;
And responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
In some embodiments, further comprising:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
Responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
and in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, further comprising:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
In some embodiments, further comprising:
judging whether the access counter of the corresponding data object is not less than a threshold value;
and in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
In some embodiments, further comprising:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
and determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object.
In some embodiments, further comprising:
and reading all data in the next data object and putting the data into a cache of the blue layer.
In some embodiments, further comprising:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a data processing system, including:
the acquisition module is configured to respond to receiving the read IO and acquire the number of times that the data object corresponding to the read IO is read in a preset time period;
the pre-reading module is configured to respond to the read times not smaller than a threshold value, start pre-reading and read all the rest data in the corresponding data object according to the address offset in the read IO;
and the return module is configured to screen data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, return the data to an upper layer and put the rest of the data into a cache of the blue store layer.
Based on the same inventive concept, according to another aspect of the present invention, an embodiment of the present invention further provides a computer apparatus, including:
at least one processor; and
a memory storing a computer program executable on the processor, the processor executing steps of any one of the data processing methods described above when the program is executed.
Based on the same inventive concept, according to another aspect of the present invention, there is also provided a computer-readable storage medium storing a computer program which, when executed by a processor, performs the steps of any of the data processing methods as described above.
The invention has one of the following beneficial technical effects: according to the scheme provided by the invention, the number of times of reading the same object in the threshold time is recognized by the blue store layer, the range of data reading is enlarged for the object judged as hot data, the read data is put into the memory cache, the data pre-reading is realized, the small IO reading is changed into the large IO reading for many times, the data of the next object can be intelligently recognized and pre-read into the memory, the data is directly read from the memory in the subsequent sequential reading of the object, the data is prevented from being read from the disk each time, the time delay of each reading is reduced, and the purpose of improving the sequential reading performance of the small IO is achieved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are necessary for the description of the embodiments or the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention and that other embodiments may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a data processing method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data processing system according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a computer device according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of a computer-readable storage medium according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the following embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
It should be noted that, in the embodiments of the present invention, all the expressions "first" and "second" are used to distinguish two entities with the same name but different entities or different parameters, and it is noted that the "first" and "second" are only used for convenience of expression, and should not be construed as limiting the embodiments of the present invention, and the following embodiments are not described one by one.
In an embodiment of the invention, distributed block storage is an expansive storage architecture. The storage architecture can realize cross-device data distribution and can share loads among a plurality of servers. In physical and virtual machine applications, block storage may be used as a long-term storage device, typically containing high-level services such as backup and snapshot.
The object is: the storage data is divided into a plurality of objects, each object has an object id, the size of each object can be set, the default is 4MB, and the object can be regarded as the minimum storage unit of the distributed storage.
OSD: the English name is Object Storage Device, and its main functions are data storage, data copying, balance data, recovery data, etc., and heartbeat checking with other OSD. In general, a hard disk corresponds to an OSD, and the OSD manages the hard disk storage, and of course, a partition may also be an OSD.
Bluestone: an object storage engine for managing underlying data is disclosed.
Onode: and recording a data structure of object metadata information, wherein each object corresponds to one onode.
Small IO: IO reads smaller than a preset size (e.g., 128 KB) are defined as small IO reads.
PG: the PG is an aggregate of some objects, is a basic unit for forming a storage pool, and various characteristics of the storage pool, such as multiple copies, erasure codes and other data backup strategies, are finally realized by means of the PG.
Suard (split): an OSD set distributed by a PG is called a card set, each OSD in the set is called a card, and each OSD in the set has a respective number, and from 0, the number of OSDs in the card set is different according to different data backup strategies. For example, in the three-copy backup strategy, a card set contains 3 osds, and the card numbers are 0, 1 and 2.
And (3) coiling: data in the block store is stored in blocks in volumes that are attached to nodes. It can provide greater storage capacity for applications and is more reliable and performance. The volumes formed by these blocks are mapped into the operating system and controlled by the file system layer.
Object name in block store: the object size in block storage is typically 4MB, a volume divides several objects by 4MB, these objects are numbered from 0 according to the volume internal offset, and the object name is composed of two parts: volume name and object number.
According to an aspect of the present invention, an embodiment of the present invention proposes a data processing method, as shown in fig. 1, which may include performing the following steps at a blue store layer:
s1, responding to receiving read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;
s2, responding to the read times not smaller than a threshold value, starting pre-reading and reading all the rest data in the corresponding data object according to the address offset in the read IO;
and S3, screening the data to be read corresponding to the read IO from the rest all data according to the address length in the read IO, returning the data to the upper layer, and putting the rest data into the cache of the blue store layer.
The scheme provided by the invention can improve the sequential reading performance of small IO, reduce frequent accesses to the disk, achieve the effects of prolonging the service life and reducing the power consumption, reduce the hardware cost and improve the competitiveness of products.
In some embodiments, further comprising:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
Specifically, an access counter and a timestamp recorder may be added to a data structure for recording metadata information of an object, so as to count the number and time of small IO read accesses of the object in a period of time.
In some embodiments, in response to receiving a read IO, acquiring a number of times that a data object corresponding to the read IO is read in a preset time period, further includes:
in response to receiving the read IO, judging whether the data to be read can be hit in the cache;
and responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
Specifically, after receiving a read request, the Bluestore tries to read data from the memory cache, and if the Bluestore hits in the memory cache, the Bluestore directly returns the data to an upper layer; if there is no hit, the data is read from the disk.
In some embodiments, further comprising:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
And in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, further comprising:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
In some embodiments, further comprising:
judging whether the access counter of the corresponding data object is not less than a threshold value;
and in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
In some embodiments, further comprising:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
Specifically, when the number of times the same object is read from the disk by a small IO is accumulated to reach a threshold (for example, 3 times) within a preset period (for example, 3 seconds), the pre-reading is started.
When the data object is read for the first time, the counter value is 0 at this time, so the corresponding counter of the object will be incremented by 1 and the current timestamp updated, at which time no pre-read operation is performed.
If the data object is not read for the first time, and the counter is not 0 at the moment, comparing the record time stamp with the current time, if the record time stamp exceeds the preset time period, setting the counter to be 1, updating the current time stamp, and not executing the pre-reading operation; if the preset time period is not longer than the second, the counter adds 1, updates the current time stamp, and meanwhile judges whether the counter value reaches the threshold value, if so, the pre-reading is started, and if not, the pre-reading is not started.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
And determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object.
In some embodiments, further comprising:
and reading all data in the next data object and putting the data into a cache of the blue layer.
Specifically, after the pre-reading is started, the read length of the read IO is expanded from length to the rest of all data of the object based on the read position (offset) of the read IO.
After the pre-reading is started, the next object to be read next on the PG can be automatically identified according to the current object name, and all data of the object can be pre-read on the PG. The method comprises the following steps: and analyzing the object name, obtaining the volume name and the object number, wherein the number of PG of the current object number plus the current storage pool is the number of the next object on the current card, and thus obtaining the object name of the next object. And pre-reading the data of the whole object according to the acquired object name.
In some embodiments, further comprising:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
Specifically, in some scenarios, all data of a next object may not be pre-read, for example, the size of each pre-read data may be set, and if all the remaining data in the data object where the data to be read is located is smaller than the size of each pre-read data, the data of the corresponding size is read from the next data object so that the pre-read data reaches the size of each pre-read data.
In some embodiments, after the bluestrore reads the remaining data of the object locally, the length data is returned to the upper layer to ensure data consistency, and the remaining data is placed in the blue cache. When the Bluestore buffers data to a certain threshold, the data that was added earliest to the buffer is trimed to free up buffer space.
It should be noted that the read-ahead of the Bluestone layer is not perceived by the upper layers.
According to the scheme provided by the invention, the number of times of reading the same object in the threshold time is recognized by the blue store layer, the range of data reading is enlarged for the object judged as hot data, the read data is put into the memory cache, the data pre-reading is realized, the small IO reading is changed into the large IO reading for many times, the data of the next object can be intelligently recognized and pre-read into the memory, the data is directly read from the memory in the subsequent sequential reading of the object, the data is prevented from being read from the disk each time, the time delay of each reading is reduced, and the purpose of improving the sequential reading performance of the small IO is achieved.
Based on the same inventive concept, according to another aspect of the present invention, there is also provided a data processing system 400, as shown in fig. 2, including:
an obtaining module 401, configured to obtain, in response to receiving a read IO, the number of times that a data object corresponding to the read IO is read in a preset time period;
a pre-reading module 402, configured to start pre-reading and read all remaining data in the corresponding data object according to the address offset in the read IO in response to the number of times that is read is not less than a threshold;
and a return module 403, configured to screen the data to be read corresponding to the read IO from the remaining all data according to the address length in the read IO, return the data to an upper layer, and put the remaining data into the cache of the blue store layer.
In some embodiments, the system further comprises a metadata module configured to:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
In some embodiments, the acquisition module 401 is further configured to:
in response to receiving the read IO, judging whether the data to be read can be hit in the cache;
And responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
In some embodiments, the acquisition module 401 is further configured to:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
In some embodiments, the acquisition module 401 is further configured to:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
In some embodiments, the acquisition module 401 is further configured to:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, the acquisition module 401 is further configured to:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, the pre-read module 402 is further configured to:
responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
and in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
In some embodiments, the pre-read module 402 is further configured to:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, the pre-read module 402 is further configured to:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, the pre-read module 402 is further configured to:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
In some embodiments, the pre-read module 402 is further configured to:
judging whether the access counter of the corresponding data object is not less than a threshold value;
and in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
In some embodiments, the pre-read module 402 is further configured to:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
In some embodiments, the pre-read module 402 is further configured to:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, the pre-read module 402 is further configured to:
Acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
and determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object.
In some embodiments, the pre-read module 402 is further configured to:
and reading all data in the next data object and putting the data into a cache of the blue layer.
In some embodiments, the pre-read module 402 is further configured to:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
According to the scheme provided by the invention, the number of times of reading the same object in the threshold time is recognized by the blue store layer, the range of data reading is enlarged for the object judged as hot data, the read data is put into the memory cache, the data pre-reading is realized, the small IO reading is changed into the large IO reading for many times, the data of the next object can be intelligently recognized and pre-read into the memory, the data is directly read from the memory in the subsequent sequential reading of the object, the data is prevented from being read from the disk each time, the time delay of each reading is reduced, and the purpose of improving the sequential reading performance of the small IO is achieved.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 3, an embodiment of the present invention further provides a computer apparatus 501, including:
at least one processor 520; and
the memory 510, the memory 510 stores a computer program 511 executable on a processor, and the processor 520 executes the program to perform the steps of:
responding to receiving a read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;
and responding to the read times not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO.
And screening the data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, returning the data to be read to an upper layer, and putting the rest of the data into a cache of the blue layer.
In some embodiments, further comprising:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
In some embodiments, in response to receiving a read IO, acquiring a number of times that a data object corresponding to the read IO is read in a preset time period, further includes:
In response to receiving the read IO, judging whether the data to be read can be hit in the cache;
and responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
In some embodiments, further comprising:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
Responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
and in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, further comprising:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
In some embodiments, further comprising:
judging whether the access counter of the corresponding data object is not less than a threshold value;
and in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
In some embodiments, further comprising:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
and determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object.
In some embodiments, further comprising:
and reading all data in the next data object and putting the data into a cache of the blue layer.
In some embodiments, further comprising:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
Based on the same inventive concept, according to another aspect of the present invention, as shown in fig. 4, an embodiment of the present invention further provides a computer-readable storage medium 601, the computer-readable storage medium 601 storing a computer program 610, the computer program 610 when executed by a processor performing the steps of:
responding to receiving a read IO, and acquiring the number of times that a data object corresponding to the read IO is read in a preset time period;
and responding to the read times not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the address offset in the read IO.
And screening the data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, returning the data to be read to an upper layer, and putting the rest of the data into a cache of the blue layer.
In some embodiments, further comprising:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
In some embodiments, in response to receiving a read IO, acquiring a number of times that a data object corresponding to the read IO is read in a preset time period, further includes:
In response to receiving the read IO, judging whether the data to be read can be hit in the cache;
and responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
In some embodiments, further comprising:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
Responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
and in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
In some embodiments, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, further comprising:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
In some embodiments, further comprising:
judging whether the access counter of the corresponding data object is not less than a threshold value;
and in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
In some embodiments, further comprising:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
In some embodiments, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
In some embodiments, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
and determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object.
In some embodiments, further comprising:
and reading all data in the next data object and putting the data into a cache of the blue layer.
In some embodiments, further comprising:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
Finally, it should be noted that, as will be appreciated by those skilled in the art, all or part of the procedures in implementing the methods of the embodiments described above may be implemented by a computer program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, and the program may include the procedures of the embodiments of the methods described above when executed.
Further, it should be appreciated that the computer-readable storage medium (e.g., memory) herein can be either volatile memory or nonvolatile memory, or can include both volatile and nonvolatile memory.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both. To clearly illustrate this interchangeability of hardware and software, various illustrative components, blocks, modules, circuits, and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as software or hardware depends upon the particular application and design constraints imposed on the overall system. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present disclosure.
The foregoing is an exemplary embodiment of the present disclosure, but it should be noted that various changes and modifications could be made herein without departing from the scope of the disclosure as defined by the appended claims. The functions, steps and/or actions of the method claims in accordance with the disclosed embodiments described herein need not be performed in any particular order. Furthermore, although elements of the disclosed embodiments may be described or claimed in the singular, the plural is contemplated unless limitation to the singular is explicitly stated.
It should be understood that as used herein, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly supports the exception. It should also be understood that "and/or" as used herein is meant to include any and all possible combinations of one or more of the associated listed items.
The foregoing embodiment of the present invention has been disclosed with reference to the number of embodiments for the purpose of description only, and does not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, and the program may be stored in a computer readable storage medium, where the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
Those of ordinary skill in the art will appreciate that: the above discussion of any embodiment is merely exemplary and is not intended to imply that the scope of the disclosure of embodiments of the invention, including the claims, is limited to such examples; combinations of features of the above embodiments or in different embodiments are also possible within the idea of an embodiment of the invention, and many other variations of the different aspects of the embodiments of the invention as described above exist, which are not provided in detail for the sake of brevity. Therefore, any omission, modification, equivalent replacement, improvement, etc. of the embodiments should be included in the protection scope of the embodiments of the present invention.
Claims (20)
1. A method of data processing comprising performing the following steps at a blue store layer:
responding to receiving a read IO, and obtaining the number of times that a data object corresponding to the read IO is read in a preset time period, wherein the size of data to be read corresponding to the read IO is smaller than a preset size;
responding to the read times not smaller than a threshold value, starting pre-reading and reading all the rest data in the corresponding data object according to the address offset in the read IO;
Screening data to be read corresponding to the read IO from the rest of all data according to the address length in the read IO, returning the data to be read to an upper layer, and putting the rest of data into a cache of the blue layer;
wherein, in response to the number of times read is not less than a threshold, starting pre-reading and reading all remaining data in the corresponding data object according to the address offset in the read IO, further comprising:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object;
and reading all data in the next data object and putting the data into a cache of the blue layer.
2. The method as recited in claim 1, further comprising:
an access counter for recording the number of times read is added to the metadata of each data object, and a time stamp recorder for recording the reading time.
3. The method of claim 2, wherein in response to receiving a read IO, obtaining a number of times a data object corresponding to the read IO is read within a preset time period, further comprising:
In response to receiving the read IO, judging whether the data to be read can be hit in the cache;
and responding to the data to be read which can be hit in the cache, and directly returning the data to be read to an upper layer.
4. A method as recited in claim 3, further comprising:
and in response to the fact that the data to be read is not hit in the cache, reading an access counter and a timestamp recorder in metadata corresponding to the corresponding data object.
5. The method of claim 4, wherein in response to the number of times read is not less than a threshold, starting a read-ahead and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
in response to the access counter being 0, setting the access counter in the corresponding metadata to 1 does not open a read ahead.
6. The method as recited in claim 5, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
7. The method as recited in claim 5, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
8. The method of claim 4, wherein in response to the number of times read is not less than a threshold, starting a read-ahead and reading all remaining data in the corresponding data object according to an address offset in the read IO, further comprising:
responding to the access counter not being 0, acquiring a timestamp recorder in the corresponding metadata and comparing the timestamp recorder with the current time;
and in response to the difference value between the access counter and the access counter being greater than the preset time period, setting the access counter of the corresponding data object to be 1 and not starting pre-reading.
9. The method as recited in claim 8, further comprising:
and directly reading the data to be read from the corresponding data object and returning to an upper layer.
10. The method as recited in claim 8, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
11. The method as recited in claim 8, further comprising:
and in response to the difference value of the access counter and the access counter being smaller than the preset time period, adding 1 to the access counter of the corresponding data object.
12. The method as recited in claim 11, further comprising:
judging whether the access counter of the corresponding data object is not less than a threshold value;
And in response to the read IO address offset being not smaller than a threshold value, starting pre-reading and reading all the remaining data in the corresponding data object according to the read IO address offset.
13. The method as recited in claim 12, further comprising:
and responding to the condition that the read-ahead is not started and the data to be read is directly read from the corresponding data object and returned to an upper layer.
14. The method of claim 11 or 13, further comprising:
and updating the timestamp recorder in the corresponding metadata according to the current time.
15. The method as recited in claim 1, further comprising:
acquiring the preset size of each pre-read data;
judging whether all the remaining data in the corresponding data object are smaller than the size of the read-ahead data each time;
and in response to all the remaining data in the corresponding data object being smaller than the size of each pre-read data, reading the data with the corresponding size from the next data object so that the pre-read data reaches the size of each pre-read data.
16. The method as recited in claim 1, further comprising:
and recycling the data in the cache of the blue layer.
17. The method of claim 16, wherein reclaiming data in the cache of the blue tier further comprises:
and responding to the data quantity in the cache of the blue store layer reaching a preset threshold value, and recovering the data which is added into the cache earliest.
18. A data processing system, comprising:
the acquisition module is configured to respond to receiving a read IO and acquire the number of times that a data object corresponding to the read IO is read in a preset time period, wherein the size of data to be read corresponding to the read IO is smaller than a preset size;
the pre-reading module is configured to respond to the read times not smaller than a threshold value, start pre-reading and read all the rest data in the corresponding data object according to the address offset in the read IO;
the return module is configured to screen data to be read corresponding to the read IO from the rest of all the data according to the address length in the read IO, return the data to an upper layer and put the rest of the data into a cache of a blue store layer;
the pre-reading module is further configured to:
acquiring and analyzing the object name of the corresponding data object to obtain an object number and a storage volume name;
determining a corresponding storage pool according to the storage volume names and acquiring the number of PG in the storage pool;
Determining the number of the next data object according to the object number plus the number of the PG, and further determining the name of the next data object;
and reading all data in the next data object and putting the data into a cache of the blue layer.
19. A computer device, comprising:
at least one processor; and
a memory storing a computer program executable on the processor, wherein the processor performs the steps of the method of any one of claims 1-17 when the program is executed.
20. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor performs the steps of the method according to any one of claims 1-17.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310026962.3A CN115827508B (en) | 2023-01-09 | 2023-01-09 | Data processing method, system, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310026962.3A CN115827508B (en) | 2023-01-09 | 2023-01-09 | Data processing method, system, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115827508A CN115827508A (en) | 2023-03-21 |
CN115827508B true CN115827508B (en) | 2023-05-09 |
Family
ID=85520452
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310026962.3A Active CN115827508B (en) | 2023-01-09 | 2023-01-09 | Data processing method, system, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115827508B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786401A (en) * | 2014-12-25 | 2016-07-20 | 中国移动通信集团公司 | Data management method and device in server cluster system |
CN106844740A (en) * | 2017-02-14 | 2017-06-13 | 华南师范大学 | Data pre-head method based on memory object caching system |
CN111190655A (en) * | 2019-12-30 | 2020-05-22 | 中国银行股份有限公司 | Processing method, device, equipment and system for application cache data |
CN113687781A (en) * | 2021-07-30 | 2021-11-23 | 济南浪潮数据技术有限公司 | Method, device, equipment and medium for pulling up thermal data |
CN114138688A (en) * | 2021-11-14 | 2022-03-04 | 郑州云海信息技术有限公司 | Data reading method, system, device and medium |
CN114527938A (en) * | 2022-01-24 | 2022-05-24 | 苏州浪潮智能科技有限公司 | Data reading method, system, medium and device based on solid state disk |
CN115203072A (en) * | 2022-06-07 | 2022-10-18 | 中国电子科技集团公司第五十二研究所 | File pre-reading cache allocation method and device based on access heat |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115686385B (en) * | 2023-01-03 | 2023-03-21 | 苏州浪潮智能科技有限公司 | Data storage method and device, computer equipment and storage medium |
-
2023
- 2023-01-09 CN CN202310026962.3A patent/CN115827508B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105786401A (en) * | 2014-12-25 | 2016-07-20 | 中国移动通信集团公司 | Data management method and device in server cluster system |
CN106844740A (en) * | 2017-02-14 | 2017-06-13 | 华南师范大学 | Data pre-head method based on memory object caching system |
CN111190655A (en) * | 2019-12-30 | 2020-05-22 | 中国银行股份有限公司 | Processing method, device, equipment and system for application cache data |
CN113687781A (en) * | 2021-07-30 | 2021-11-23 | 济南浪潮数据技术有限公司 | Method, device, equipment and medium for pulling up thermal data |
CN114138688A (en) * | 2021-11-14 | 2022-03-04 | 郑州云海信息技术有限公司 | Data reading method, system, device and medium |
CN114527938A (en) * | 2022-01-24 | 2022-05-24 | 苏州浪潮智能科技有限公司 | Data reading method, system, medium and device based on solid state disk |
CN115203072A (en) * | 2022-06-07 | 2022-10-18 | 中国电子科技集团公司第五十二研究所 | File pre-reading cache allocation method and device based on access heat |
Also Published As
Publication number | Publication date |
---|---|
CN115827508A (en) | 2023-03-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US7711916B2 (en) | Storing information on storage devices having different performance capabilities with a storage system | |
CN102782683B (en) | Buffer pool extension for database server | |
CN113296696B (en) | Data access method, computing device and storage medium | |
US20150039837A1 (en) | System and method for tiered caching and storage allocation | |
CN110413685B (en) | Database service switching method, device, readable storage medium and computer equipment | |
CN114281762B (en) | Log storage acceleration method, device, equipment and medium | |
JP6409105B2 (en) | Storage constrained synchronization of shared content items | |
EP3789883A1 (en) | Storage fragment managing method and terminal | |
CN108108089B (en) | Picture loading method and device | |
CN113778662B (en) | Memory recovery method and device | |
CN113806300B (en) | Data storage method, system, device, equipment and storage medium | |
CN109558456A (en) | A kind of file migration method, apparatus, equipment and readable storage medium storing program for executing | |
CN107133334B (en) | Data synchronization method based on high-bandwidth storage system | |
CN117235088A (en) | Cache updating method, device, equipment, medium and platform of storage system | |
CN115827508B (en) | Data processing method, system, equipment and storage medium | |
JP2005258789A (en) | Storage device, storage controller, and write back cache control method | |
CN110413689B (en) | Multi-node data synchronization method and device for memory database | |
CN111913913A (en) | Access request processing method and device | |
KR101419428B1 (en) | Apparatus for logging and recovering transactions in database installed in a mobile environment and method thereof | |
CN114328007B (en) | Container backup and restoration method, device and medium thereof | |
CN117785933A (en) | Data caching method, device, equipment and readable storage medium | |
CN115543930A (en) | Method, device and related equipment for locking file in memory | |
CN103164431B (en) | The date storage method of relevant database and storage system | |
CN117950597B (en) | Data modification writing method, data modification writing device, and computer storage medium | |
CN112131433B (en) | Interval counting query method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |