CN114675785A - Distributed storage method, device, system and medium - Google Patents

Distributed storage method, device, system and medium Download PDF

Info

Publication number
CN114675785A
CN114675785A CN202210318371.9A CN202210318371A CN114675785A CN 114675785 A CN114675785 A CN 114675785A CN 202210318371 A CN202210318371 A CN 202210318371A CN 114675785 A CN114675785 A CN 114675785A
Authority
CN
China
Prior art keywords
data
current data
pool
cache pool
cache
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210318371.9A
Other languages
Chinese (zh)
Inventor
曹磊
高传集
孙思清
王腾飞
李超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202210318371.9A priority Critical patent/CN114675785A/en
Publication of CN114675785A publication Critical patent/CN114675785A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/061Improving I/O performance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F12/00Accessing, addressing or allocating within memory systems or architectures
    • G06F12/02Addressing or allocation; Relocation
    • G06F12/08Addressing or allocation; Relocation in hierarchically structured memory systems, e.g. virtual memory systems
    • G06F12/0802Addressing of a memory level in which the access to the desired data or data block requires associative addressing means, e.g. caches
    • G06F12/0877Cache access modes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0668Interfaces specially adapted for storage systems adopting a particular infrastructure
    • G06F3/067Distributed or networked storage systems, e.g. storage area networks [SAN], network attached storage [NAS]

Abstract

The application discloses a distributed storage method, after current data are obtained, whether the current data hit in a cache pool or not is judged, and if the current data do not hit, whether the current data meet a resident condition or not is judged. If yes, the current data is resident in a cache pool; and if not, storing the current data to the data pool. Compared with the prior art, the data in the data pool needs to be copied to the cache pool, so that the data storage efficiency is reduced. In addition, the application also discloses a distributed storage device, a system and a medium, which correspond to the distributed storage method and have the same effects.

Description

Distributed storage method, device, system and medium
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a distributed storage method, apparatus, system, and medium.
Background
With the development of the era and the increase of services, more and more data are generated, and the data storage is particularly important.
The device for storing data is usually a mechanical hard disk, a solid state hard disk, a non-volatile storage medium, etc. Among them, a storage medium with a large capacity is slower but cheaper, and a storage medium with a smaller capacity is faster but cheaper. Ceph, as a unified, distributed file system designed for excellent performance, reliability and scalability, can store hot data in a cache pool as a fast hard disk device and cold data in a data pool as a slow hard disk device.
When the system acquires new data, the data is stored in the data pool due to the fact that the data is in a miss state in the cache pool, and when the data is acquired again subsequently, the data is copied to the cache pool, so that the data can be used rapidly and frequently. According to the method, data needs to be copied from the data pool to the cache pool, the IO path is long, and the data storage efficiency is reduced.
Therefore, how to shorten the IO path and improve the efficiency of data storage is an urgent problem to be solved by those skilled in the art.
Disclosure of Invention
The application aims to provide a distributed storage method, a device, a system and a medium, which are used for shortening an IO path and improving the efficiency of data storage.
In order to solve the above technical problem, the present application provides a distributed storage method, including:
acquiring current data;
judging whether the current data is hit in a cache pool or not;
if not, judging whether the current data meets the resident condition;
if yes, the current data is resident in the cache pool; and if not, storing the current data into a data pool.
Preferably, the determining whether the current data meets the residence condition includes:
defining a heat value for the current data;
and judging whether the heat value of the current data is not less than a threshold value, if not, determining that the residence condition is met, and if so, determining that the residence condition is not met.
Preferably, the defining the heat value for the current temperature data includes:
creating a set every preset time, and storing the object name of the current data into the newly created set; the object names of the data accessing the cache pool within preset time are stored in the set;
judging whether the number of the sets is larger than a preset value or not, and if so, deleting the set created earliest;
wherein, each set is provided with a heat value when being created, so that the heat values of the data corresponding to the object names entering the same set are the same;
the heat value of the newly created set is greater than that of the newly created set, and the heat value of each set is reduced along with the increase of time from the newly created set;
and adding the heat values of the object name reactions of the current data in each set to be used as the heat value of the current data.
Preferably, after the step of residing the current data in the cache pool, the method further includes:
defining a heat attribute value for the current data;
the heat attribute value decreases along with the increase of time and increases along with the increase of the number of hits;
and sorting the data in the cache pool according to the heat attribute value, and driving the data with the lowest heat attribute value out of the cache pool at intervals of second preset time and storing the data into the data pool.
Preferably, the method further comprises the following steps:
and adjusting the second preset time according to an error between the change speed of the data volume in the cache pool and the expected change speed, wherein the second preset time is specifically the sum of an integral term of the error multiplied by a proportional coefficient and the error multiplied by an integral coefficient and a differential term of the error multiplied by a differential coefficient.
Preferably, after the step of residing the current data in the cache pool, the method further includes:
adjusting a cache mode according to the data volume of the cache pool, and flushing dirty data in the cache pool back to the data pool, wherein the cache mode comprises a write-back mode, a direct-write mode and a write-around mode;
if the data volume of the cache pool is smaller than a first threshold value, switching the cache mode to the write-back mode;
if the data volume of the cache pool is larger than the first threshold and smaller than a second threshold, switching the cache mode to the write-through mode;
and if the data volume of the cache pool is larger than the second threshold value, switching the cache mode to the write-around mode.
Preferably, the method further comprises the following steps: and adjusting the use interval of the thread according to the acquired data volume of the current data so as to store the data in the cache pool and the data in the data pool into a nonvolatile storage medium.
In order to solve the above technical problem, the present application further provides a distributed storage apparatus, including:
the acquisition module is used for acquiring current data;
the first judgment module is used for judging whether the current data is hit in the cache pool or not;
the second judging module is used for judging whether the current data meet the residence condition or not if the current data do not hit;
the processing module is used for residing the current data in the cache pool if the current data meets the requirement; and if not, storing the current data into a data pool.
In order to solve the above technical problem, the present application further provides a distributed storage system, which includes a memory for storing a computer program;
a processor for implementing the steps of the distributed storage method as described above when executing the computer program.
In order to solve the above technical problem, the present application further provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the steps of the distributed storage method as described above.
According to the distributed storage method, after the current data are obtained, whether the current data hit in the cache pool or not is judged, and if the current data do not hit, whether the current data meet the residence conditions or not is judged. If yes, the current data is resident in a cache pool; and if not, storing the current data to the data pool. Compared with the prior art, the data in the data pool needs to be copied to the cache pool, so that the data storage efficiency is reduced.
In addition, the distributed storage device, the system and the medium provided by the application correspond to the distributed storage method, and the effect is the same as that of the distributed storage method.
Drawings
In order to more clearly illustrate the embodiments of the present application, the drawings needed for the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained by those skilled in the art without inventive effort.
Fig. 1 is a flowchart of a distributed storage method according to an embodiment of the present application;
fig. 2 is a structural diagram of a distributed storage apparatus according to an embodiment of the present application;
fig. 3 is a structural diagram of a distributed storage system according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only some embodiments of the present application, and not all embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments in the present application without any creative effort belong to the protection scope of the present application.
The core of the application is to provide a distributed storage method, device, system and medium, which are used for shortening IO paths and improving data storage efficiency.
In order that those skilled in the art will better understand the disclosure, the following detailed description will be given with reference to the accompanying drawings.
Fig. 1 is a flowchart of a distributed storage method according to an embodiment of the present application, and as shown in fig. 1, the method includes:
s10: and acquiring current data.
S11: and judging whether the current data is hit in the cache pool, and if not, entering the step S12.
S12: and judging whether the current data meets the resident condition, if so, entering the step S13, and if not, entering the step S14.
S13: the current data resides in the cache pool.
S14: and storing the current data into a data pool.
Ceph, a unified, distributed file system designed for excellent performance, reliability, and scalability, can store hot data in a cache pool as a fast hard disk device and cold data in a data pool as a slow hard disk device. When creating the cache pool, in the Ceph, OSD (object Storage device) is a process responsible for returning specific data in response to a request from the client, and when creating the cache pool, the OSD in the cache pool needs to correspond to the OSD of the data pool one by one, specifically, a cache pool fast disk device corresponds to a same-node data pool slow disk device. The OSD in the cache pool may be a fast disk device or a logical volume generated by the fast disk device, and the mapping relationship is contained in osdmap, broadcasted to the global and finally persisted to disk. It should be noted that after the mapping relationship is established, cross-pool data migration occurs on the same node, and the single copy or erasure code segment on the OSD is used as the granularity, so that the influence of the cross-pool migration data on the user IO is greatly reduced, and the network layer consumption of the data migration is reduced.
In steps S10 and S11, after receiving the user service IO, that is, the current data, the cache pool first determines whether the data already exists in the cache pool, determines whether the data is hit, and if not, the process goes to step S12 to determine whether the current data meets the retention condition, specifically, whether the current data meets the retention condition is determined according to the determination of whether the current data is obtained for multiple times, the historical obtaining time, and the like. If the current data meets the residence condition, the process proceeds to step S13, where the current data is resident in the cache pool, and if the residence condition is not met, the current data is saved in the data pool. It will be appreciated that the data residing in the cache pool should be hot data that is frequently used and the data in the data pool should be cold data that is infrequently used. In a specific implementation, the step of determining whether the data can reside in the cache pool should be repeatedly performed, and as the process is performed, when the hot data in the cache pool becomes cold data or dirty data, the hot data should be stored into the data pool through eviction or back-flushing.
In specific implementation, the user service includes reading and writing, when the user service is a reading process, after data is acquired, whether the data hits in a cache pool is judged, if so, the data can be directly read, if not, the data needs to be read from the data pool, whether a residence condition is met is judged, and if so, the data resides in the cache pool.
And when the user service is a write process, similarly, whether the data is hit in the cache pool is judged firstly, if so, the write operation can be performed in the cache pool, and if not, whether the residence condition is met needs to be judged. If the residence condition is satisfied, the write data can be resident in the cache pool, and if the residence condition is not satisfied, the write data needs to be stored in the data pool, which is different from the read flow.
According to the distributed storage method provided by the embodiment of the application, after the current data is obtained, whether the current data is hit in the cache pool is judged, and if the current data is not hit, whether the current data meets the residence condition is judged. If yes, the current data is resident in a cache pool; and if not, storing the current data to the data pool. Compared with the prior art, the data in the data pool needs to be copied to the cache pool, so that the data storage efficiency is reduced.
It is understood that the data that can reside in the cache pool should be data that needs to be frequently used, and when determining whether the data can reside, the determination should be made in combination with the frequency of occurrence of the current data and the time interval of occurrence. On the basis of the foregoing embodiment, this embodiment provides a specific method for determining whether current data meets a resident condition, where the method includes:
defining a heat value for the current data;
and judging whether the heat value of the current data is not less than a threshold value, if not, determining that the residence condition is met, and if so, determining that the residence condition is not met.
In this embodiment, a heat value is defined for the current data, which is a time interval representing the historical frequency of occurrence of the current data. When the hot value of the current data is not less than the threshold value, it indicates that the current data meets the residence condition and should reside in the cache pool.
The distributed storage method provided by the embodiment uses the heat value as a judgment basis for judging whether the current data can reside in the cache pool, and can more accurately reflect the heat condition of the current data.
The foregoing embodiment provides a specific method for determining whether current data meets a residence condition, and this embodiment provides a specific method for defining a heat value for the current data in the method, where in this embodiment, defining a heat value for the current temperature data includes:
creating a set every preset time, and storing the object name of the current data into the newly created set; the object names of the data accessing the cache pool within preset time are stored in the set;
judging whether the number of the sets is larger than a preset value or not, and if so, deleting the set created earliest;
wherein, each set is provided with a heat value when being created, so that the heat values of the data corresponding to the object names entering the same set are the same;
the heat value of the newly created set is greater than that of the newly created set, and the heat value of each set is reduced along with the increase of time from the newly created set;
and adding the heat values of the object name reactions of the current data in each set to be used as the heat value of the current data.
It should be noted that the set in this embodiment is a set of string types created after the cache pool is created. The set stores the object names of the data accessing the cache pool within a preset time, and the preset time can be determined by combining the data volume of the user service IO. In this embodiment, a set is created every predetermined time, and when the number of sets is greater than a predetermined value, the oldest created set is deleted. That is, in the present embodiment, the maximum number of sets is fixed, and the number of sets is kept unchanged after reaching the preset value. Each set is created with a heat value, so that the data corresponding to the object name entered into the set has the same heat value. In other embodiments, the heat value of each set may be the same, and this embodiment provides a preferable solution, in this embodiment, the heat value of the newly created set is greater than the heat value of the second newly created set, and the heat value of each set decreases with increasing time from the newly created set. And when judging, adding the heat degrees of the sets corresponding to the object names with the current data in each set to be used as the heat degree value of the current data. It will be appreciated that as the sets are deleted and created, the time period of the heat value at the statistical data is fixed, i.e., preset for a preset time. Therefore, the heat value of each set also changes with the deletion and creation of the set, for example, the heat value of the newly created set is the highest, and the heat value decreases after the next set is created.
For ease of understanding, the following description is provided in conjunction with a specific usage scenario.
For example, the preset value of the number of sets is 4, and the heat values from the oldest created set to the newest created set are 4, 4, 8, 12 in order.
It is understood that if the object name of the current data appears in the latest set and the second latest set, the heat value of the current data is 20. After the heat value of the current data is counted, the heat value can be compared with a threshold value. It can be seen that in this scenario, the maximum heat value of the data is 28, and therefore, the threshold may be set at 7 × k, where k is an integer from 0 to 4. For example, if k is 3, the threshold is 21, and therefore the heat value of the data corresponding to the object name appearing in the newest set and the next-to-newest set is smaller than the threshold, and the data cannot reside. It can be seen that if k is 0, it means that the data can reside as long as it appears in the set, and if k is 4, the object name of the data needs to appear in each set to reside.
The embodiment provides a specific method for defining the heat value for the current data, which is used for converting the historical occurrence times and the occurrence time of the current data into the heat value so as to judge the heat value and the threshold value.
The capacity of the cache pool is limited and small, which affects the IO rate if too much data resides in the cache pool.
Therefore, on the basis of the above embodiment, in this embodiment, after the step of residing the current data in the cache pool, the method further includes:
defining a heat attribute value for the current data;
the heat attribute value decreases with the increase of time and increases with the increase of the number of hits;
and sorting the data in the cache pool according to the heat attribute value, and driving the data with the lowest heat attribute value out of the cache pool at intervals of second preset time and storing the data into the data pool.
In this embodiment, the current data, when residing in the cache pool, is defined with a hot attribute value, which represents the usage of the data. Each data is defined with an initial hot attribute value at the time of the resident, the initial hot attribute value provided by the embodiment is the maximum value of two bytes, 65535, and the hot attribute value decreases with the increase of time and increases with the increase of the number of hits. For example, time decreases by 1 every 1 second and increases by 1 every hit. In order to reduce the data volume pressure of the cache pool, the data in the cache pool is sorted according to the heat attribute value, specifically, the data can be sorted according to a stub heap method, and the data with the lowest heat attribute value is evicted from the cache pool at intervals of a second preset time and stored in the data pool.
In the distributed storage method provided by this embodiment, the heat attribute values are defined for the data residing in the cache pool and are sorted, and the data with the lowest heat attribute value is evicted from the cache pool and stored in the data pool, so that the data capacity of the cache pool is released, and the IO rate is increased.
In a specific implementation, the time interval between two evictions should be different according to the amount of data in the cache pool, and therefore, the distributed storage method provided in this embodiment further includes:
and adjusting the second preset time according to the error between the change speed of the data volume in the buffer pool and the expected change speed, wherein the second preset time is specifically the sum of the integral term of the error multiplied by the proportional coefficient and the error multiplied by the integral coefficient and the differential term of the error multiplied by the differential coefficient.
It is understood that the larger the integral coefficient, the smaller the static error between the buffer pool data amount proportion and the desired proportion, but the longer the time for the buffer pool data amount proportion to reach the desired proportion. The larger the differential coefficient is, the shorter the time for the proportion of the buffer pool data amount to reach the desired proportion is, but the more unstable the system is, the phenomenon that the proportion of the buffer pool data oscillates up and down is easy to occur. Therefore, the proportional coefficient, the differential coefficient and the integral coefficient need to be adjusted according to different user IO scenes. It should be noted that the expected change speed of the buffer pool data amount is inversely proportional to the buffer pool data capacity, the expected change speed is 0 when the buffer pool data capacity is a set capacity threshold, and the expected change speed is a positive value when the buffer pool data capacity is smaller than the set capacity threshold, and is a negative value otherwise.
In the distributed storage method provided in this embodiment, the second preset time is adjusted according to the error between the change speed of the data amount in the cache pool and the expected change speed, and the eviction speed of the data is increased when the remaining capacity in the cache pool is low, so that the user service is facilitated.
During the task, the latest version of the data exists in the cache pool, but the historical version of the data exists in the data pool, and at this time, the data in the cache pool is dirty data, and the dirty data needs to be refreshed back, so that the version of the data in the data pool is updated to be the latest version. Therefore, on the basis of the above embodiment, in this embodiment, after the step of residing the current data in the cache pool, the method further includes:
adjusting a cache mode according to the data volume of the cache pool, and flushing dirty data in the cache pool back to the data pool, wherein the cache mode comprises a write-back mode, a direct-write mode and a write-around mode;
if the data volume of the cache pool is smaller than a first threshold value, switching the cache mode into a write-back mode;
if the data volume of the cache pool is larger than a first threshold value and smaller than a second threshold value, switching the cache mode into a direct-write mode;
and if the data volume of the cache pool is larger than the second threshold value, switching the cache mode into a write-around mode.
In this embodiment, the cache mode of the current data is adjusted according to the data amount of the cache pool, for example, when the ratio of the data amount of the cache pool to the total capacity is less than or equal to 80%, the cache mode of the cache pool is a write-back mode, when the data satisfies the residence condition, the write IO is directly returned to the IO result of the client after the cache pool is completed, and the dirty data is flushed back to the data pool by a separate write-back thread. And when the data volume proportion of the cache pool is greater than 80% and less than or equal to 90%, the cache mode of the cache pool is a direct-write mode, the write IO is returned to the client IO result after waiting for completion of both the cache pool and the data pool, and if the write IO object is dirty data in the cache pool, the write IO is directly returned to the client IO result after completion of the cache pool. And when the data volume proportion of the cache pool is greater than 90%, the cache mode of the cache pool is a write-around mode, and the write IO skips the cache pool to directly access the data pool. It can be appreciated that the write-back mode can provide faster IO rates when the amount of data is small, the write-through mode can prevent loss of large amounts of data when the amount of data is large, and the write-around mode can provide higher efficiency when the amount of data is extremely large.
In the foregoing embodiment, according to different situations of the data amount in the cache pool, the time interval between every two evictions is adjusted by using a PID adjustment method, and similarly, in this embodiment, the back-flushing interval of the dirty data may also be adjusted by using this method, which is not described again in this embodiment.
In a specific implementation, the fdatasync thread calls a kernel fdatasync interface to store data in the cache pool and data of the data pool in the nonvolatile storage medium. On the basis of the above embodiment, in this embodiment, the method further includes: and adjusting the use interval of the thread according to the acquired data amount of the current data so as to store the data in the cache pool and the data in the data pool into the nonvolatile storage medium.
In the distributed storage method provided by this embodiment, the more the data amount of the current data is, the shorter the usage interval of the fdatasync thread is, and the less the data amount of the current data is, the longer the usage interval of the fdatasync thread is. When the data volume is large, the fDataSync thread is frequently called, so that the loss of a large amount of data when a disk fails is prevented, the calling frequency of the fDataSync thread is reduced when the data volume is small, and the resource loss is reduced.
In the foregoing embodiments, the distributed storage method is described in detail, and the present application also provides embodiments corresponding to the distributed storage apparatus. It should be noted that the present application describes the embodiments of the apparatus portion from two perspectives, one from the perspective of the function module and the other from the perspective of the hardware.
Fig. 2 is a structural diagram of a distributed storage apparatus according to an embodiment of the present application, and as shown in fig. 2, the apparatus includes:
and an obtaining module 10, configured to obtain current data.
The first judging module 11 is configured to judge whether the current data hits in the cache pool.
The second determining module 12 is configured to determine whether the current data meets the residence condition if the current data does not meet the residence condition.
A processing module 13, configured to, if yes, reside the current data in the cache pool; and if not, storing the current data to the data pool.
Since the embodiments of the apparatus portion and the method portion correspond to each other, please refer to the description of the embodiments of the method portion for the embodiments of the apparatus portion, which is not repeated here.
According to the distributed storage device provided by the embodiment of the application, after the current data is obtained, whether the current data is hit in the cache pool is judged, and if the current data is not hit, whether the current data meets the residence condition is judged. If yes, the current data is resident in a cache pool; and if not, storing the current data to the data pool. Compared with the prior art, the data in the data pool needs to be copied to the cache pool, so that the data storage efficiency is reduced.
Fig. 3 is a structural diagram of a distributed storage system according to an embodiment of the present application, and as shown in fig. 3, the system includes: a memory 20 for storing a computer program;
and a processor 21, configured to implement the steps of the distributed storage method according to the above-described embodiment when executing the computer program.
The distributed storage system provided by the embodiment may include, but is not limited to, a smart phone, a tablet computer, a notebook computer, or a desktop computer.
The processor 21 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The Processor 21 may be implemented in hardware using at least one of a Digital Signal Processor (DSP), a Field-Programmable Gate Array (FPGA), and a Programmable Logic Array (PLA). The processor 21 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 21 may be integrated with a Graphics Processing Unit (GPU) which is responsible for rendering and drawing the content required to be displayed by the display screen. In some embodiments, the processor 21 may further include an Artificial Intelligence (AI) processor for processing computational operations related to machine learning.
The memory 20 may include one or more computer-readable storage media, which may be non-transitory. Memory 20 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In this embodiment, the memory 20 is at least used for storing the following computer program 201, wherein after being loaded and executed by the processor 21, the computer program can implement the relevant steps of the distributed storage method disclosed in any one of the foregoing embodiments. In addition, the resources stored in the memory 20 may also include an operating system 202, data 203, and the like, and the storage manner may be a transient storage manner or a permanent storage manner. Operating system 202 may include, among others, Windows, Unix, Linux, and the like. Data 203 may include, but is not limited to, heat values, heat attribute values, and the like.
In some embodiments, the distributed storage system may further include a display 22, an input/output interface 23, a communication interface 24, a power supply 25, and a communication bus 26.
Those skilled in the art will appreciate that the architecture shown in FIG. 3 is not meant to be limiting of a distributed storage system and may include more or fewer components than those shown.
The distributed storage system provided by the embodiment of the application comprises a memory and a processor, and when the processor executes a program stored in the memory, the following method can be realized:
acquiring current data;
judging whether the current data is hit in the cache pool;
if not, judging whether the current data meets the residence condition;
if yes, the current data is resident in a cache pool; and if not, storing the current data into the data pool.
Finally, the application also provides a corresponding embodiment of the computer readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps as set forth in the above-mentioned method embodiments.
It is to be understood that if the method in the above embodiments is implemented in the form of software functional units and sold or used as a stand-alone product, it can be stored in a computer readable storage medium. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium and executes all or part of the steps of the methods described in the embodiments of the present application, or all or part of the technical solutions. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The distributed storage methods, apparatuses, systems, and media provided herein have been described in detail above. The embodiments are described in a progressive manner in the specification, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description. It should be noted that, for those skilled in the art, it is possible to make several improvements and modifications to the present application without departing from the principle of the present application, and such improvements and modifications also fall within the scope of the claims of the present application.
It should also be noted that, in this specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A distributed storage method, comprising:
acquiring current data;
judging whether the current data is hit in a cache pool or not;
if not, judging whether the current data meets the resident condition;
if yes, the current data is resident in the cache pool; and if not, storing the current data into a data pool.
2. The distributed storage method according to claim 1, wherein said determining whether the current data satisfies a resident condition comprises:
defining a heat value for the current data;
and judging whether the heat value of the current data is not less than a threshold value, if not, determining that the residence condition is met, and if so, determining that the residence condition is not met.
3. The distributed storage method of claim 2, wherein said defining a heat value for said current temperature data comprises:
creating a set every preset time, and storing the object name of the current data into the newly created set; the object names of the data accessing the cache pool within preset time are stored in the set;
judging whether the number of the sets is larger than a preset value or not, and if so, deleting the set created earliest;
the method comprises the following steps that a hot value is set in each set during creation, so that hot values of data corresponding to object names entering the same set are the same;
the hot value of the newly created set is greater than that of the newly created set, and the hot value of each set is reduced along with the increase of time from the latest set;
and adding the heat values reflected by the object names of the current data in the sets to serve as the heat value of the current data.
4. The distributed storage method according to claim 1, further comprising, after said step of residing said current data in said cache pool:
defining a heat attribute value for the current data;
the heat attribute value decreases along with the increase of time and increases along with the increase of the number of hits;
and sorting the data in the cache pool according to the heat attribute value, and driving the data with the lowest heat attribute value out of the cache pool at intervals of second preset time and storing the data into the data pool.
5. The distributed storage method of claim 4, further comprising:
and adjusting the second preset time according to an error between the change speed of the data volume in the cache pool and the expected change speed, wherein the error is a proportional factor times the integral term of the error and an integral factor times the error, and the differential term times the error is a differential factor times the sum.
6. The distributed storage method according to claim 1, further comprising, after said step of residing said current data in said cache pool:
adjusting a cache mode according to the data volume of the cache pool, and flushing dirty data in the cache pool back to the data pool, wherein the cache mode comprises a write-back mode, a direct-write mode and a write-around mode;
if the data volume of the cache pool is smaller than a first threshold value, switching the cache mode to the write-back mode;
if the data volume of the cache pool is larger than the first threshold and smaller than a second threshold, switching the cache mode to the write-through mode;
and if the data volume of the cache pool is larger than the second threshold value, switching the cache mode to the write-around mode.
7. The distributed storage method according to claim 1, further comprising: and adjusting the use interval of the thread according to the acquired data volume of the current data so as to store the data in the cache pool and the data in the data pool into a nonvolatile storage medium.
8. A distributed storage apparatus, comprising:
the acquisition module is used for acquiring current data;
the first judgment module is used for judging whether the current data is hit in the cache pool or not;
the second judging module is used for judging whether the current data meet the residence condition or not if the current data do not hit;
the processing module is used for residing the current data in the cache pool if the current data meets the requirement; and if not, storing the current data into a data pool.
9. A distributed storage system comprising a memory for storing a computer program;
a processor for implementing the steps of the distributed storage method of any one of claims 1 to 7 when executing said computer program.
10. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when being executed by a processor, carries out the steps of the distributed storage method according to any one of claims 1 to 7.
CN202210318371.9A 2022-03-29 2022-03-29 Distributed storage method, device, system and medium Pending CN114675785A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210318371.9A CN114675785A (en) 2022-03-29 2022-03-29 Distributed storage method, device, system and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210318371.9A CN114675785A (en) 2022-03-29 2022-03-29 Distributed storage method, device, system and medium

Publications (1)

Publication Number Publication Date
CN114675785A true CN114675785A (en) 2022-06-28

Family

ID=82075661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210318371.9A Pending CN114675785A (en) 2022-03-29 2022-03-29 Distributed storage method, device, system and medium

Country Status (1)

Country Link
CN (1) CN114675785A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686385A (en) * 2023-01-03 2023-02-03 苏州浪潮智能科技有限公司 Data storage method and device, computer equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115686385A (en) * 2023-01-03 2023-02-03 苏州浪潮智能科技有限公司 Data storage method and device, computer equipment and storage medium

Similar Documents

Publication Publication Date Title
EP3229142B1 (en) Read cache management method and device based on solid state drive
US11704239B2 (en) Garbage collection method for storage medium, storage medium, and program product
US9069484B2 (en) Buffer pool extension for database server
KR20060129873A (en) Method for executing garbage collection of mobile terminal
CN112988332B (en) Virtual machine live migration prediction method and system and computer readable storage medium
US7882285B2 (en) Buffer cache management to prevent deadlocks
CN115576501B (en) Node updating method, system and related device of RAID card
CN114968839A (en) Hard disk garbage recycling method, device and equipment and computer readable storage medium
CN111124304B (en) Data migration method and device, electronic equipment and storage medium
CN114675785A (en) Distributed storage method, device, system and medium
US11237761B2 (en) Management of multiple physical function nonvolatile memory devices
US10073851B2 (en) Fast new file creation cache
CN113672166A (en) Data processing method and device, electronic equipment and storage medium
CN112015343A (en) Cache space management method and device of storage volume and electronic equipment
CN115080625B (en) Caching method, device and equipment based on Spring Cache framework and storage medium
CN111488128B (en) Method, device, equipment and medium for updating metadata
CN110795034B (en) Data migration method, device and equipment of storage system and readable storage medium
CN111090633A (en) Small file aggregation method, device and equipment of distributed file system
CN111352590A (en) File storage method and equipment
CN115525219A (en) Object data storage method, device and medium
CN116775961A (en) Retrieving query results using a query filter
KR101618999B1 (en) Network boot system
CN114020764B (en) Processing method and system for high-performance data clone copy
US20230244390A1 (en) Collecting quality of service statistics for in-use child physical functions of multiple physical function non-volatile memory devices
CN115857795A (en) Data writing method, database system, electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination