CN115878513B - Data storage and data query method, device, equipment and storage medium - Google Patents

Data storage and data query method, device, equipment and storage medium Download PDF

Info

Publication number
CN115878513B
CN115878513B CN202310143690.5A CN202310143690A CN115878513B CN 115878513 B CN115878513 B CN 115878513B CN 202310143690 A CN202310143690 A CN 202310143690A CN 115878513 B CN115878513 B CN 115878513B
Authority
CN
China
Prior art keywords
data
cache
target
cluster
target data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310143690.5A
Other languages
Chinese (zh)
Other versions
CN115878513A (en
Inventor
尚晶
肖智文
武智晖
郭志伟
陈卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Original Assignee
China Mobile Communications Group Co Ltd
China Mobile Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Mobile Communications Group Co Ltd, China Mobile Information Technology Co Ltd filed Critical China Mobile Communications Group Co Ltd
Priority to CN202310143690.5A priority Critical patent/CN115878513B/en
Publication of CN115878513A publication Critical patent/CN115878513A/en
Application granted granted Critical
Publication of CN115878513B publication Critical patent/CN115878513B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The embodiment of the application discloses a data storage and data query method, a device, equipment and a storage medium, wherein the data storage method comprises the following steps: acquiring the data heat of target data in a distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster; dynamically migrating a target data copy of the plurality of data copies between the first cache cluster and the storage cluster based on the data warmth; other data copies of the plurality of data copies are persisted to the storage cluster.

Description

Data storage and data query method, device, equipment and storage medium
Technical Field
The present application relates to the field of, but not limited to, data caching technologies, and in particular, to a data storage and data query method, apparatus, device, and storage medium.
Background
Caching plays a vital role in improving the read performance of a storage system and reducing the response time of an application program. In the related art, the caching scheme is mostly based on memory implementation, the storage space is limited, all data cannot be cached generally, when query data is not in the memory, only the rear-end persistent storage node in the distributed storage system can be accessed, however, in the process of querying the data according to the storage mechanism of the distributed storage system, the data response is slower, and the data query requirement is difficult to adapt.
Disclosure of Invention
In view of this, the embodiments of the present application at least provide a data storage and data query method, apparatus, device, and storage medium.
The technical scheme of the embodiment of the application is realized as follows:
in one aspect, an embodiment of the present application provides a data storage method, including:
acquiring the data heat of target data in a distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster;
dynamically migrating a target data copy of the plurality of data copies between the first cache cluster and the storage cluster based on the data warmth; other data copies of the plurality of data copies are persisted to the storage cluster.
In another aspect, an embodiment of the present application provides a data query method, where the method includes:
responding to a query event aiming at target data, sending a first query request to a first cache cluster in a distributed cache system, and receiving first query feedback;
sending a second query request to a storage cluster in the distributed cache system and receiving second query feedback under the condition that the first query feedback characterizes that the target data does not exist in the first cache cluster;
The target data is stored with a plurality of data copies in the distributed cache system, the target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on the heat of the target data, and other data copies in the plurality of data copies are permanently stored in the storage cluster.
In yet another aspect, an embodiment of the present application provides a multi-tier storage system, including a cache controller and a distributed cache system, the distributed cache system including a storage cluster and a first cache cluster, wherein: the distributed cache system is used for storing a plurality of data copies of target data; the cache controller is used for controlling target data copies in the plurality of data copies to dynamically migrate between the first cache cluster and the storage cluster based on the data heat of the target data, and other data copies in the plurality of data copies are stored in the storage cluster in a lasting mode.
In some embodiments, the cache controller is further configured to control migration of the target data copy from the storage cluster to the first cache cluster if the target data copy is stored in the storage cluster and the data heat exceeds a first threshold.
In some embodiments, the first cache cluster includes a plurality of first cache nodes; the cache controller is further configured to send a first migration instruction to a first control node in the distributed cache system;
the first control node is used for determining a first hash value corresponding to the target data based on a first hash function; determining a first target cache node among the plurality of first cache nodes based on the first hash value; and migrating the target data copy from the storage cluster to the first target cache node.
In some embodiments, the determining a first target cache node among the plurality of first cache nodes based on the first hash value includes:
determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node;
and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
In some embodiments, the cache controller is further configured to control migration of the target data copy from the first cache cluster to the storage cluster if the target data copy is stored in the first cache cluster and the data heat is less than a second threshold.
In some embodiments, the storage cluster includes a plurality of storage nodes; the cache controller is further configured to send a second migration instruction to a first control node in the distributed cache system;
the first control node is used for determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster; and migrating the target data copy from the first cache cluster to the target storage node.
In some embodiments, the cache controller is further configured to control migration of a data copy of associated data of the target data from the storage cluster to the first cache cluster if the data heat of the target data exceeds a first threshold.
In some embodiments, the multi-tiered storage system further includes a second cache cluster; the cache controller is further configured to control the target data to be cached in the second cache cluster in response to an access event for the target data.
In some embodiments, the second cache cluster includes a plurality of second cache nodes; the cache controller is further configured to send a first cache instruction to a second control node in the second cache cluster;
The second control node is used for determining a second hash value corresponding to the target data based on a second hash function; determining a second target cache node among the plurality of second cache nodes based on the second hash value; and caching the target data to the second target cache node.
In some embodiments, the determining a second target cache node among the plurality of second cache nodes based on the second hash value includes:
determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node;
and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
In some embodiments, the multi-tier storage system further comprises a programmable switch cluster; the second cache cluster responds to an overload event of the second target cache node and sends a second cache instruction to the cache controller;
the cache controller is further configured to turn on a cache function of a target programmable switch connected to the second target cache node in the programmable switch cluster; and caching the target data to the target programmable switch.
In some embodiments, the distributed cache system further comprises a first control node;
the first control node is used for responding to a data update request aiming at the target data and writing data update operation into a log; the data updating request carries updated target data; invalidating target data in the target programmable switch and updating target data in the second target cache node based on the updated target data;
the distributed cache system is further configured to update a plurality of copies of data stored in the distributed cache system for the target data based on the log.
In still another aspect, an embodiment of the present application provides a data storage device, including: the acquisition module is used for acquiring the data heat of the target data in the distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster; a migration module, configured to dynamically migrate, based on the data hotness, a target data copy of the plurality of data copies between the first cache cluster and the storage cluster; other data copies of the plurality of data copies are persisted to the storage cluster.
In still another aspect, an embodiment of the present application provides a data query apparatus, including: the query module is used for responding to a query event aiming at target data, sending a first query request to a first cache cluster in the distributed cache system and receiving first query feedback; the query module is further configured to send a second query request to a storage cluster in the distributed cache system and receive a second query feedback when the first query feedback characterizes that the target data does not exist in the first cache cluster; the target data is stored with a plurality of data copies in the distributed cache system, the target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on the heat of the target data, and other data copies in the plurality of data copies are permanently stored in the storage cluster.
In yet another aspect, an embodiment of the present application provides a computer device including a memory and a processor, where the memory stores a computer program executable on the processor, and where the processor implements some or all of the steps of the above method when the program is executed.
In yet another aspect, embodiments of the present application provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method.
Based on the embodiment provided by the application, the data heat of the target data is monitored, and the target data copy in the plurality of data copies corresponding to the target data is dynamically migrated between the first cache cluster and the storage cluster; in this way, when the target data is hot data, the target data copy of the target data can be migrated to the first cache cluster, so that the target data can be directly obtained from the first cache cluster in the process of inquiring the target data, the response speed of data inquiry is increased, and the overall cache hit rate of the data is improved; meanwhile, the copy target data copy in the related technology is replaced by a mode of transferring the target data copy, so that the maintenance pressure of data consistency of the distributed cache system is simplified.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
Fig. 1 is a schematic diagram of an implementation flow of a data storage method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a second implementation flow of a data storage method according to an embodiment of the present application;
fig. 3 is a schematic diagram of an implementation flow chart of a data storage method according to an embodiment of the present application;
fig. 4 is a schematic diagram of an implementation flow of a data storage method according to an embodiment of the present application;
fig. 5 is a schematic diagram of an implementation flow of a data query method according to an embodiment of the present application;
fig. 6 is a second schematic implementation flow chart of a data query method according to an embodiment of the present application;
fig. 7 is a schematic diagram III of an implementation flow of a data query method according to an embodiment of the present application;
FIG. 8A is a schematic diagram of a system architecture of a multi-layered memory system according to an embodiment of the present application;
FIG. 8B is a second schematic diagram of a system architecture of a multi-layered memory system according to an embodiment of the present application;
FIG. 8C is a third schematic diagram of a system architecture of a multi-layered memory system according to an embodiment of the present application;
FIG. 9 is a schematic diagram of an alternative architecture of a data heat aware multi-layer memory structure according to an embodiment of the present application;
fig. 10 is a schematic diagram of an implementation flow of a data query method based on data heat sensing according to an embodiment of the present application;
FIG. 11 is a schematic diagram of an implementation flow of a data update method of a multi-layer memory structure based on data heat sensing according to an embodiment of the present application;
FIG. 12 is a schematic diagram of an alternative architecture of a data heat aware multi-layer memory structure provided by an embodiment of the present application;
fig. 13 is a schematic implementation flow diagram of a data query method of an SSD cache cluster according to an embodiment of the application;
fig. 14 is a schematic flowchart of an implementation of a data location determining method according to an embodiment of the present application;
FIG. 15 is a schematic diagram of a data location according to an embodiment of the present application;
FIG. 16 is a schematic flow chart of a copy migration mechanism for data heat awareness according to an embodiment of the present application;
FIG. 17 is a schematic diagram of a storage location of data C in an initial state according to an embodiment of the present application;
FIG. 18 is a schematic diagram of a storage location of data C under frequent access according to an embodiment of the present application;
FIG. 19 is a schematic diagram of storage locations of migrated data C and associated data according to an embodiment of the present application;
Fig. 20 is a schematic diagram of a query flow of the hottest data C according to an embodiment of the present application;
FIG. 21 is a schematic diagram of a query flow of thermal data D according to an embodiment of the present application;
FIG. 22 is a schematic diagram of a query flow of thermal data related data G according to an embodiment of the present application;
FIG. 23 is a schematic diagram of a query flow of cold data A according to an embodiment of the present application;
fig. 24 is a flowchart of a data updating method of data C according to an embodiment of the present application;
FIG. 25 is a schematic diagram of a data storage device according to an embodiment of the present application;
fig. 26 is a schematic diagram of a composition structure of a data query device according to an embodiment of the present application;
fig. 27 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application.
Detailed Description
The technical solution of the present application will be further elaborated with reference to the accompanying drawings and examples, which should not be construed as limiting the application, but all other embodiments which can be obtained by one skilled in the art without making inventive efforts are within the scope of protection of the present application.
In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict. The term "first/second/third" is merely to distinguish similar objects and does not represent a particular ordering of objects, it being understood that the "first/second/third" may be interchanged with a particular order or precedence, as allowed, to enable embodiments of the application described herein to be implemented in other than those illustrated or described herein.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing the application only and is not intended to be limiting of the application.
International Data Corporation (IDC) expects that global data volumes will reach 175ZB by 2025 and mass data storage and access will face significant challenges. Caching plays a vital role in improving the read performance of a storage system and reducing the response time of an application program. However, existing caching techniques have some problems and challenges:
(1) Most of the existing caching schemes are based on memory, the memory read-write speed is high, but compared with a mechanical hard disk, the existing caching schemes are expensive and have limited storage space, and all data cannot be cached generally. When the query data is not in the memory, only the data copy in the distributed storage system can be accessed, and the response time is increased.
(2) Due to the volatility of the memory, all cache data is lost when the cache node fails. At this time, the data query request is forwarded to the back-end persistent storage server, and higher response delay is caused by querying the data and addressing the mechanical hard disk in the distributed storage system.
(3) Most of the current distributed cache systems do not consider the heat difference of the cache contents and the unbalanced load among the cache nodes. The caching system caches the hot spot data to provide the query service. Although the data in the cache system is hot data, the hot of the different data is not consistent. If there is a large number of concurrent accesses to some very hot data, this can cause overload of the cache nodes that cache the hot data and network congestion, resulting in service unavailability.
(4) Data consistency between the distributed storage system and the cache is challenging. In distributed storage systems, data is typically persisted in multiple copies in order to provide reliability, availability, and read concurrency of the data. However, this introduces the overhead of data consistency between multiple copies of the same data.
Despite the differences in storage objects and organization management, currently mainstream distributed storage systems all provide three copies of data and corresponding data consistency schemes. In the current caching system, three main data caching schemes exist: one is to copy the data stored in persistence to the memory, and the data copy in the memory provides the query service of the data. The data caching scheme is simple, but if the caching node cannot access, the data query request can only be provided through the persistent storage system, and fault tolerance is lacked; and secondly, multiple copies of the same data are provided in the cache system, so that the scheme can avoid single-point faults of a single-copy cache scheme, and the data access performance is improved. However, this multi-copy caching scheme introduces a problem of consistency between multiple copies in memory and complicates the maintenance of consistency between multiple copies of memory and multiple copies in persistent storage of the same data. The third data caching scheme is a data coding scheme similar to erasure codes, and the fault tolerance of the cached data is improved by coding the cached data, so that the problem of consistency of multiple copies of the data is avoided. But this approach introduces storage and computation overhead for the check data, as well as computation overhead for the original data by the check data.
Embodiments of the present application provide a data storage method that may be performed by a processor of a computer device. The computer device may be a device with data processing capability, such as a server, a notebook computer, a tablet computer, a desktop computer, a smart television, a set-top box, a mobile device (e.g., a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, and a portable game device). In an embodiment of the present application, the computer device is a cache controller in a multi-tier storage system.
Fig. 1 is a schematic implementation flow diagram of a data storage method according to an embodiment of the present application, as shown in fig. 1, the method includes steps S101 to S102 as follows:
step S101, acquiring the data heat of target data in a distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster.
In some embodiments, the data warmth of the target data is related to a condition in which the target data is accessed within a preset period. The data heat of the target data can be determined based on the access times by counting the access times of the target data in the preset period; the access time of the target data to be accessed in the preset period can be counted, and the data heat of the target data is determined based on the access time; of course, other methods of determining the heat of the data may be used, and the application is not limited in this regard.
In some embodiments, the distributed cache system may store multiple copies of data corresponding to target data and maintain consistency between the copies of data using the distributed cache system's own multi-copy consistency maintenance mechanism.
In some embodiments, the distributed caching system may deploy distributed storage software, such as an HDFS file system, or the like.
Step S102, dynamically migrating a target data copy in the plurality of data copies between the first cache cluster and the storage cluster based on the data heat; other data copies of the plurality of data copies are persisted to the storage cluster.
In some embodiments, the data read-write performance of the first cache node in the first cache cluster is higher than the data read-write performance of the storage node in the storage cluster.
In some embodiments, the target data copy may be migrated from the storage cluster to the first cache cluster in the event that the data heat characterizes a change in the target data from cold data to hot data.
In some embodiments, the target data copy may be migrated from the first cache cluster into the storage cluster in the event that the data heat characterizes a change in the target data from hot data to cold data.
It should be noted that, after the target data copy is migrated from the storage cluster to the first cache cluster, or after the target data copy is migrated from the first cache cluster to the storage cluster, the multi-copy consistency maintenance mechanism of the distributed cache system is still adopted to maintain consistency between the data copies.
In some embodiments, the storage locations of other data copies in the plurality of data copies are relatively fixed, i.e. are persistently stored in the storage cluster, and the storage locations of corresponding target data copies vary with data heat, and may be stored in the storage cluster or in the first cache cluster.
By way of example, the storage cluster may be an HDD storage cluster and the first cache cluster may be an SSD cache cluster.
Based on the embodiment provided by the application, the data heat of the target data is monitored, and the target data copy in the plurality of data copies corresponding to the target data is dynamically migrated between the first cache cluster and the storage cluster; in this way, when the target data is hot data, the target data copy of the target data can be migrated to the first cache cluster, so that the target data can be directly obtained from the first cache cluster in the process of inquiring the target data, the response speed of data inquiry is increased, and the overall cache hit rate of the data is improved; meanwhile, the copy target data copy in the related technology is replaced by a mode of transferring the target data copy, so that the maintenance pressure of data consistency of the distributed cache system is simplified.
Fig. 2 is a second alternative flow chart of a data storage method according to an embodiment of the present application, which may be executed by a processor of a computer device. Based on fig. 1, step S102 in fig. 1 may be updated to step S201 to step S202, which will be described in connection with the steps shown in fig. 2.
Step S201, when the target data copy is stored in the storage cluster and the data heat exceeds a first threshold, migrating the target data copy from the storage cluster to the first cache cluster.
And under the condition that the data heat exceeds a first threshold value, the target data is characterized to be changed from cold data to hot data, and then, a target data copy of the target data is migrated from the storage cluster to the first cache cluster. At this time, the target data copy of the target data is stored in the first cache cluster, and other data copies of the target data are still stored in the storage cluster. At the same time, the distributed cache system maintains consistency between the target data copy and other data copies.
In some embodiments, the first cache cluster includes a plurality of first cache nodes; the above migration of the target data copy from the storage cluster to the first cache cluster may be implemented through steps S2011 to S2013.
Step S2011, determining a first hash value corresponding to the target data based on a first hash function.
Step S2012, determining a first target cache node among the plurality of first cache nodes based on the first hash value.
In some embodiments, the step S2011 to the step S2012 first calculate the first hash value corresponding to the target data through a preset first hash function, and then may obtain the first node hash value corresponding to each first cache node, and determine the first target cache node in the plurality of first cache nodes based on the numerical relationship between the first hash value and each first node hash value.
In other embodiments, the step S2012 described above may be implemented by: determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node; and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
Wherein the first hash ring may be pre-constructed, the first hash ring includes a predetermined number of positions, wherein the start point of the first hash ring is 0, the end point is (predetermined number-1), and the start point is connected to the end point, and the predetermined number may be set to 2 ζ2, for example.
In some embodiments, the management area corresponding to each first cache node may be pre-calculated, including: determining a first node hash value corresponding to an IP address of each first cache node based on a first hash function; taking the remainder of the first node hash value based on the number of positions included in the first hash ring to obtain first node position data; determining a first node location of the first cache node in a first hash ring based on the first node location data; and taking the area from the first node position of the first cache node to the first node position of the next first cache node as the management area of the first cache node according to a preset sequence.
In some embodiments, determining the first data location of the target data in the first hash ring based on the first hash value may include: taking the remainder of the first hash value based on the number of positions included in the first hash ring to obtain target position data; a first data location of the target data in a first hash ring is determined based on the target location data.
And step S2013, migrating the target data copy from the storage cluster to the first target cache node.
Wherein the target data copy is migrated to a first target cache node in the first cache cluster.
Step S202, when the target data copy is stored in the first cache cluster and the data heat is less than a second threshold, migrating the target data copy from the first cache cluster to the storage cluster.
And under the condition that the data heat is smaller than a second threshold value, the target data is characterized to be changed from hot data to cold data, and then, a target data copy of the target data is migrated from the first cache cluster to the storage cluster. At this point, the target data copy and the other data copies of the target data are both stored in the storage cluster. At the same time, the distributed cache system maintains consistency between the target data copy and other data copies.
In some embodiments, the first threshold value may be the same as the second threshold value or may be different.
In some embodiments, the storage cluster includes a plurality of storage nodes; the above-mentioned migration of the target data copy from the first cache cluster to the storage cluster may be achieved by step S2021 to step S2022.
Step S2021, determining a target storage node among the plurality of storage nodes based on a storage mechanism of the storage cluster itself.
Step S2022, migrating the target data copy from the first cache cluster to the target storage node.
In some embodiments, the method may further comprise: and under the condition that the data heat of the target data exceeds a first threshold value, migrating the data copy of the associated data of the target data from the storage cluster to the first cache cluster.
In the process of counting the data heat of the target data, the relevance between the data is calculated simultaneously, and the data with the relevance exceeding the preset relevance threshold value is used as the relevance data of the target data. As with the target data, the associated data also has a plurality of data copies in the distributed cache system, where the migrating the data copy of the associated data of the target data from the storage cluster to the first cache cluster includes: and migrating the target data copy corresponding to the associated data from the storage cluster to the first cache cluster, and persistently storing other data copies corresponding to the associated data in the storage cluster.
Based on the above embodiment, when the data heat of the target data is increased and becomes hot data, the target data copy corresponding to the target data may be migrated into the first cache cluster, and at the same time, the first target cache node is determined to be used for caching the target data copy in the first cache cluster based on the first hash function. In this way, in the process of needing to access and inquire the target data, the position of the target data copy in the first cache cluster can be directly determined through the first hash function, so that the response speed of data inquiry is improved, and the overall cache hit rate of the data is improved.
Fig. 3 is a schematic flow chart III of an alternative method for storing data, which can be executed by a processor of a computer device, according to an embodiment of the present application. Based on any of the above embodiments, taking fig. 1 as an example, fig. 1 may further include step S301, which will be described in connection with the steps shown in fig. 3.
Step S301, in response to an access event for the target data, caching the target data to a second cache cluster.
The target data copy corresponding to the target data can be directly queried from the first cache cluster, and cached to the second cache cluster to respond to the access event of the target data; and under the condition that the target data copy corresponding to the target data does not exist in the first cache cluster, querying any one data copy corresponding to the target data in the storage cluster, and caching the data copy to the second cache cluster so as to respond to the access event of the target data. It should be noted that, the process of caching the target data into the second cache cluster is a data replication process, which is different from the migration process of the target data copy in the first cache cluster and the storage cluster in the above embodiment.
In some embodiments, the second cache cluster includes a plurality of second cache nodes; the above-mentioned caching of the target data to the second cache cluster may be achieved through step S3011 to step S3013.
Step S3011, determining a second hash value corresponding to the target data based on a second hash function.
Wherein the second hash function is different from the first hash function.
Step S3012, determining a second target cache node from the plurality of second cache nodes based on the second hash value.
In some embodiments, the step S2011 to the step S2012 first calculate a second hash value corresponding to the target data through a preset second hash function, and then may obtain a second node hash value corresponding to each second cache node, and determine a second target cache node in the plurality of second cache nodes based on a numerical relationship between the second hash value and each second node hash value.
In other embodiments, the step S3012 may be implemented as follows: determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node; and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
Wherein the second hash ring may be pre-constructed, the second hash ring includes a predetermined number of positions, wherein the start point of the second hash ring is 0, the end point is (predetermined number-1), and the start point is connected to the end point, and the predetermined number may be set to 2 ζ2, for example.
In some embodiments, the management area corresponding to each second cache node may be pre-calculated, including: determining, for each second cache node, a second node hash value corresponding to the IP address of the second cache node based on a second hash function; taking the remainder of the second node hash value based on the number of positions included in the second hash ring to obtain second node position data; determining a second node location of the second cache node in a second hash ring based on the second node location data; and taking the area from the second node position of the second cache node to the second node position of the next second cache node as the management area of the second cache node according to the preset sequence.
In some embodiments, determining the second data location of the target data in the second hash ring based on the second hash value may include: taking the remainder of the second hash value based on the number of positions included in the second hash ring to obtain target position data; a second data location of the target data in a second hash ring is determined based on the target location data.
Step S3013, caching the target data to the second target cache node.
Based on the above embodiment, a second target cache node is determined in the second cache cluster for caching the target data based on the second hash function. In this way, in the process of needing to access and inquire the target data, the position of the target data in the second cache cluster can be directly determined through the second hash function, so that the response speed of data inquiry is improved, and the overall cache hit rate of the data is improved.
In some embodiments, the method may further comprise steps S302 to S303.
Step S302, in response to an overload event of the second target cache node, starting a cache function of a target programmable switch connected to the second target cache node.
Step S303, the target data is cached to the target programmable switch.
In some embodiments, the second cache node in the second cache cluster may be connected to the network through a programmable switch cluster. The programmable switch cluster is composed of a plurality of programmable switches with buffering capacity, and each second buffering node is connected with the corresponding programmable switch.
In some embodiments, the overload event may include the following: when the load between the second target cache nodes is unbalanced due to different heat of the stored hot data of different second cache nodes; or in case the second target cache node is overloaded or network congested due to the second target cache node having cached very hot data. Thus, the target programmable switch connected to the second target cache node will cache the data with highest access frequency in the memory cache node, i.e. the hottest data HT. And the response speed is improved, and the load of the second target cache node is reduced.
Fig. 4 is a schematic flow chart of an alternative method for storing data according to an embodiment of the present application, which may be executed by a processor of a computer device. Based on fig. 3, the method may further include steps S401 to S403.
Step S401, responding to a data update request for the target data, and writing a data update operation into a log; the data update request carries updated target data.
And step S402, invalidating the target data in the target programmable switch, and updating the target data in the second target cache node based on the updated target data.
Step S403, updating multiple data copies stored in the distributed cache system by the target data based on the log.
Based on the above embodiment, when the system executes the update operation, the decoupling of the memory data update and the persistent storage copy update is realized by updating the data in the memory cache node and writing the update operation into the log, so that the response time of the write operation is shortened.
Embodiments of the present application provide a data storage method that may be performed by a processor of a computer device. The computer device may be a device with data processing capability, such as a server, a notebook computer, a tablet computer, a desktop computer, a smart television, a set-top box, a mobile device (e.g., a mobile phone, a portable video player, a personal digital assistant, a dedicated messaging device, and a portable game device). In the embodiment of the present application, the above-mentioned computer device is a terminal device/client device.
Fig. 5 is a schematic flow chart of an alternative method for querying data provided in an embodiment of the present application, which may be executed by a processor of a computer device. The steps shown in fig. 5 will be described.
Step S501, in response to a query event for target data, a first query request is sent to a first cache cluster in the distributed cache system, and a first query feedback is received.
In some embodiments, the client device is the initiator of the data query request. The first query request may carry a data identifier (key) of the target data, and after the first query request is sent to a first cache cluster in the distributed cache system, the first cache cluster may query whether corresponding target data exists in its own cache based on the data identifier of the target data, and generate the first query feedback.
The first query feedback may characterize that the target data does not exist in the first cache cluster; the first query feedback may further characterize that the target data exists in the first cache cluster, where the target data is carried in the first query feedback.
Step S502, when the first query feedback characterizes that the target data does not exist in the first cache cluster, a second query request is sent to a storage cluster in the distributed cache system, and a second query feedback is received.
In some embodiments, the second query request also carries a data identifier (key) of the target data, and after the second query request is sent to a storage cluster in the distributed cache system, the storage cluster may query whether corresponding target data exists in its own cache based on the data identifier of the target data, and generate the second query feedback.
In some embodiments, the target data stores a plurality of data copies in the distributed cache system, target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on a hotness of the target data, and other data copies in the plurality of data copies are persistently stored in the storage cluster.
Accordingly, since other data copies corresponding to the target data are persistently stored in the storage cluster, the second query feedback characterizes that the target data exist in the first cache cluster and carries the target data.
Fig. 6 is a second flowchart of an alternative data query method according to an embodiment of the present application, which may be executed by a processor of a computer device. Based on fig. 5, step S501 in fig. 5 may be updated to steps S601 to S603, which will be described in connection with the steps shown in fig. 6.
Step S601, determining a first hash value corresponding to the target data based on a first hash function.
Step S602, determining a first target cache node from the plurality of first cache nodes based on the first hash value.
In some embodiments, the step S602 may be implemented by: determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node. And taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node. And sending the first query request to the first target cache node.
Here, the above steps S601 to S602 correspond to the above steps S2021 to S2022, respectively, and the specific embodiments of the above steps S2021 to S2022 may be referred to in the implementation.
Step S603, sending the first query request to the first target cache node.
In some embodiments, the step S502 may be updated from step S604 to step S605.
Step S604, determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster;
step S605, sending the second query request to the target storage node.
Fig. 7 is a schematic diagram of an alternative flow chart of a data query method according to an embodiment of the present application, which may be executed by a processor of a computer device. Based on fig. 5, step S501 in fig. 5 may be updated to steps S701 to S702, which will be described in connection with the steps shown in fig. 7.
Step S701, in response to a query event for the target data, a third query request is sent to the second cache cluster, and a third query feedback is received.
In some embodiments, the sending of the third query request to the second cache cluster may be implemented through steps S7011 to S7013.
Step S7011, determining a second hash value corresponding to the target data based on a second hash function.
Step S7012 determines a second target cache node from the plurality of second cache nodes based on the second hash value.
The step S7012 may be implemented by: determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node; and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
Step S7013, a third query request is sent to the second target cache node.
In some embodiments, the sending of the third query request to the second target cache node may be implemented by: sending the third query request to a target programmable switch connected with the second target cache node; the third query request is used for indicating the target programmable switch to query the target data in a switch cache, and forwarding the third query request to the second target cache node when the target data does not exist in the switch cache.
Here, the steps S7011 to S7012 correspond to the steps S3011 to S3012, respectively, and the specific embodiments of the steps S3011 to S3012 may be referred to in the implementation.
Accordingly, the above-described receiving of the third query feedback may be accomplished by: and receiving the third query feedback sent by the second target cache node.
In other embodiments, the third query request is further configured to instruct the target programmable switch to query the switch cache for the target data, and send a third query feedback carrying the target data if the target data exists in the switch cache.
Accordingly, the above-described receiving of the third query feedback may be accomplished by: and receiving third query feedback carrying the target data, which is sent by the target programmable switch.
Step S702, when the third query feedback characterizes that the target data does not exist in the second cache cluster, a first query request is sent to the first cache cluster in the distributed cache system, and a first query feedback is received.
Fig. 8A is a schematic diagram of an alternative storage structure of a multi-tier storage system according to an embodiment of the present application, where the multi-tier storage system 800 includes a distributed cache system 810 and a cache controller 820, and the distributed cache system 810 includes a storage cluster 811 and a first cache cluster 812, where:
The distributed cache system 800 is configured to store multiple copies of data of target data;
the cache controller 820 is configured to control a target data copy of the plurality of data copies to be dynamically migrated between the first cache cluster 812 and the storage cluster 811 based on data heat of the target data, and other data copies of the plurality of data copies are persistently stored in the storage cluster 811.
In some embodiments, the cache controller 820 is further configured to control migration of the target data from the storage cluster 811 to the first cache cluster 812 if the target data is stored in the storage cluster 811 and the data heat exceeds a first threshold.
In some embodiments, the first cache cluster 812 includes a plurality of first cache nodes; the cache controller 820 is further configured to send a first migration instruction to a first control node in the distributed cache system 810;
the first control node is used for determining a first hash value corresponding to the target data based on a first hash function; determining a first target cache node among the plurality of first cache nodes based on the first hash value; the target data is migrated from the storage cluster 811 to the first target cache node.
In some embodiments, the determining a first target cache node among the plurality of first cache nodes based on the first hash value includes:
determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node;
and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
In some embodiments, the cache controller 820 is further configured to control migration of the target data from the first cache cluster 812 to the storage cluster 811 if the target data is stored in the first cache cluster 812 and the data heat is less than a second threshold.
In some embodiments, the storage cluster 811 includes a plurality of storage nodes; the cache controller 820 is further configured to send a second migration instruction to the first control node in the distributed cache system 810;
the first control node is configured to determine a target storage node from the plurality of storage nodes based on a storage mechanism of the storage cluster 811 itself; the target data is migrated from the first cache cluster 812 to the target storage node.
In some embodiments, the cache controller 820 is further configured to control migration of a data copy of associated data of the target data from the storage cluster 811 to the first cache cluster 812 if the data heat of the target data exceeds a first threshold.
Fig. 8B is a schematic diagram of another alternative storage structure of the multi-layer storage system according to the embodiment of the present application. In some embodiments, the multi-tiered storage system 800 further includes a second cache cluster 830; the cache controller 820 is further configured to control the caching of the target data into the second cache cluster 830 in response to an access event for the target data.
In some embodiments, the second cache cluster 830 includes a plurality of second cache nodes; the cache controller 820 is further configured to send a first cache instruction to a second control node in the second cache cluster 830;
the second control node is used for determining a second hash value corresponding to the target data based on a second hash function; determining a second target cache node among the plurality of second cache nodes based on the second hash value; and caching the target data to the second target cache node.
In some embodiments, the determining a second target cache node among the plurality of second cache nodes based on the second hash value includes: determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node; and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
FIG. 8C is a schematic diagram of a third alternative memory architecture of a multi-layered memory system according to an embodiment of the present application. In some embodiments, the multi-tier storage system 800 further comprises a programmable switch cluster 840; the second cache cluster 830 sends a second cache instruction to the cache controller 820 in response to an overload event of the second target cache node;
the cache controller 820 is further configured to turn on a cache function of a target programmable switch connected to the second target cache node in the programmable switch cluster 840; and caching the target data to the target programmable switch.
In some embodiments, the distributed cache system 810 further comprises a first control node;
The first control node is used for responding to a data update request aiming at the target data and writing data update operation into a log; the data updating request carries updated target data; invalidating target data in the target programmable switch and updating target data in the second target cache node based on the updated target data;
the distributed cache system 810 is further configured to update a plurality of copies of data stored in the distributed cache system 810 for the target data based on the log.
The following embodiment will take the first cache cluster as an SSD cache cluster, the second cache cluster as a memory cache cluster, and the storage cluster as an HDD cache cluster as an example for the description of the scene embodiment.
The application provides a multi-layer distributed cache structure and method for data heat perception, comprising (1) a hierarchical storage structure for data heat perception, (2) a copy lifting migration mechanism for data heat perception, (3) a method for caching RaaC (Replication as a Cache) as a data copy and (4) a hottest data caching mechanism based on a programmable switch.
Referring to fig. 9, an alternative structure diagram of a data heat sensing multi-layer memory structure according to an embodiment of the application is shown.
As shown in fig. 9, the data heat aware multi-layer storage structure includes: distributed cache system 910, memory cache cluster 920, programmable switch cluster 930, cache controller 940, and client 950.
The distributed cache system 910 includes: HDD storage cluster 911 and SSD cache cluster 912. Wherein, the distributed cache system 910 performs three copies of data storage: wherein two copy storage locations are relatively fixed and the other copy storage location varies with data heat. The distributed caching system 910 may deploy distributed storage software, such as an HDFS file system, and the like.
The HDD storage cluster 911 is composed of several HDD storage nodes, and is responsible for the persistent storage of two or three copies of data. Each HDD storage node is independent of each other, and may have the same or different data storage types and management manners.
The SSD cache cluster 912 is comprised of a number of SSD cache nodes, caching hot data and one copy of hot data-associated data. The SSD cache cluster 912 distributes hot data or associated data into individual SSD cache nodes by a consistent hashing algorithm. The client side realizes the positioning of the SSD cache node and the inquiry of the target data through the same hash algorithm.
In the distributed cache system 910, the same hot data or associated data stores two copies in the HDD storage cluster and one copy is cached in the SSD cache cluster. The consistency among the three data copies is maintained by a distributed cache system, namely a multi-copy consistency maintenance mechanism of the distributed cache system such as HDFS and the like.
The SSD cache cluster is one component of the distributed cache system, and provides cache service for data query requests of the distributed cache system, and the SSD cache cluster provides load balancing capability of the distributed cache system.
The programmable switch cluster 930 is comprised of a number of programmable switches with caching capabilities. The memory cache nodes in the memory cache cluster 920 are connected to the network through programmable switches. When load imbalance among the memory cache nodes is caused by different heat of stored hot data of different memory cache nodes, or when the memory cache nodes are overloaded or network is congested due to very hot data being cached, the programmable switch connected with the memory cache nodes can cache the data with highest access frequency in the memory cache nodes, namely the hottest data HT. The response speed is improved, and the load of the memory cache node is reduced.
The cache controller 940 has functions of heat and association management of data, copy migration management of hot data and associated data, and cache control of a programmable switch. (1) The buffer controller 940 periodically counts the access frequency of data for a period of time, divides the data heat, and calculates the correlation between the data. (2) After detecting that the heat of a certain data reaches a preset threshold, the cache controller 940 notifies the control node of the distributed cache system 910 to perform data migration. (3) The memory cache node informs the cache controller when overload or network congestion occurs, and the cache controller 940 enables the cache function of the programmable switch connected thereto to cache the hottest data in the memory cache node.
The client 950 is the initiator of the data query request. The client 950 sends a query request to the target data in the memory cache cluster and the SSD cache cluster through two independent hash functions, respectively. The header of the query request packet sent by the client 950 contains the target data Key value. When the programmable switch does not start the buffer function, only forwarding the data packet, and not inquiring the target data Key value field; when the caching function is started, the programmable switch looks up the target data Key value field of the data packet header and searches and matches the target data Key value field in the self cache region.
The application classifies data into four categories according to data heat, including: the hottest data HT, the hot data HD, the hot data associated data HR, and the cold data CD. The data heat refers to the frequency of data queries in a unit time.
The application provides a copy lifting migration mechanism based on data heat perception of the multi-layer storage architecture, which comprises that after the data heat is raised, a data copy is lifted from an HDD storage cluster to an SSD cache cluster, and after the data heat is lowered, the data copy is sunk from the SSD cache cluster to the HDD storage cluster.
The cache controller analyzes the data heat according to the data access frequency in a period of time. When a certain data heat exceeds a certain threshold value to become hot data, the cache controller informs a control node of the distributed cache system to promote the hot data and the related data thereof. The control node of the distributed cache system calculates a destination SSD cache node that is to be migrated to the SSD cache cluster using a consistent hash function h2 for the hot data and its associated data. After the destination SSD cache node is obtained, the control node of the distributed cache system respectively rises one copy of the hot data and the related data to the destination SSD cache node. After the copy is promoted, the control node of the distributed cache system continues to maintain the consistency among the three copies of the data.
When the access frequency of certain hot data is reduced within a period of time, after the data heat is reduced to exceed a set threshold, the cache controller informs a control node of the distributed cache system to sink the data and the associated data according to a cache update strategy. And the control node of the distributed cache system sinks the data and the associated data into the HDD storage cluster by utilizing a copy migration scheme of the control node.
The present application provides a method of caching RaaC (Replication as a Cache) a copy of data. One copy of the hot data and its associated data in the distributed cache system is promoted to the SSD cache node of the SSD cache cluster by a consistent hashing algorithm.
The client directly inquires the hot data and the associated data in the SSD cache cluster through the same consistent hash, and the client does not pass through an addressing mechanism of the distributed cache system. In the query process, the data copy in the SSD cache cluster becomes the cache of the distributed cache system, and load balancing among the HDD storage nodes can be provided.
The present application provides a warmest data caching mechanism based on a programmable switch. Different memory cache nodes can cause unbalanced load among the memory cache nodes due to different heat of stored data, or overload of the memory cache nodes or network congestion due to extremely hot data being cached. And when the load exceeds a certain threshold, the memory cache node informs the cache controller of the hottest data in the node, and the cache controller controls the programmable switch connected with the cache node to enable the cache function of the programmable switch and caches the hottest data. When a query request from a client is received, if the cache function of the programmable switch is enabled, carrying out hash comparison on a query object in the query request and data cached by the programmable switch, and if the queried data object is in the cache of the programmable switch, taking out the data from the cache and returning the data to the client; if the query object does not match the cached content, the query request is forwarded.
After the load of the memory cache node is lower than the threshold, the memory cache node notifies the cache controller, and the cache controller controls the programmable switch connected with the cache node to stop the cache function.
Fig. 10 is a schematic implementation flow chart of a data query method based on data heat sensing according to an embodiment of the present application.
Step S1001, the client calculates which memory cache node in the memory cache cluster the target data is stored in using the hash function h 1;
step S1002, a client sends a query request to a determined memory cache node;
step S1003, when the programmable switch receives the inquiry request from the client, judging whether the buffer function is started;
if yes, executing step S1004, otherwise executing step S1006;
step S1004, the programmable switch judges whether a target data Key carried in the query request is consistent with a data Key cached by the programmable switch;
if the two are consistent, the following step S1005 is executed, otherwise, step S1006 is executed;
step S1005, the programmable switch returns the cached data to the client;
step S1006, the programmable switch forwards the data query request to the memory cache node;
step S1007, when the memory cache node receives the data query request, judging whether the cache data contains the target data of the query request;
If the target data in the query request is cached, step S1008 is executed, otherwise step S1009 is executed;
step S1008, the memory caching node returns the cached target data to the client;
in step S1009, the memory cache node notifies the client of the query failure message. After receiving the inquiry failure message from the memory caching node, the client calculates which SSD caching node in the SSD caching cluster the target data is stored in by using the hash function h 2;
step S1010, the client sends a query request to the determined SSD cache node;
step S1011, when the SSD cache node receives a data query request, judging whether the cache data contains target data of the query request;
if the target data in the query request is cached, step S1012 is executed, otherwise step S1013 is executed;
step S1012, the SSD cache node returns the cached target data to the client;
in step S1013, the SSD cache node notifies the client of the query failure message. After receiving the query failure message from the SSD cache node, the client sends a query request to a control node of the distributed cache system;
step S1014, after receiving the query request from the client, the control node of the distributed cache system returns the position of the data storage node where the target data is located to the client according to the storage mechanism of the distributed cache system;
In step S1015, the client initiates a query request to the data storage node according to the received information of the data storage node, to obtain the target data.
Fig. 11 is a schematic implementation flow chart of a data updating method of a multi-layer storage structure based on data heat sensing according to an embodiment of the present application, and the specific steps are as follows:
step 1101, a client sends a data update request to a control node of a distributed cache system;
step S1102, a control node of the distributed cache system writes the update operation into a log;
step S1103, the control node of the distributed cache system informs the cache controller, and the cache controller determines whether the updated data is in the cache of the programmable switch, if so, the updated data is invalidated;
step S1104, the control node of the distributed cache system uses a consistent hash algorithm to find a memory cache node cached by the target update data, and writes the new data into the memory cache node in the memory cache cluster;
in step S1105, the distributed cache system updates data according to the operation steps in the log.
When the system executes the updating operation, the method realizes the decoupling of the updating of the memory data and the updating of the persistent storage copy by updating the data in the memory cache node and writing the updating operation into the log, and shortens the response time of the writing operation.
The beneficial effects of the application are as follows: the data storage mode and the position of the data in the data heat-aware multi-layer storage structure can change along with the data heat, so that the load balance of the storage server is ensured;
the copy lifting migration mechanism of data heat perception can avoid network congestion and packet loss caused by a large number of concurrent requests of hot data transmission in the memory cache by a user, and separate management operations such as data updating, copy consistency and the like from data access;
in addition, the data copy, namely the caching method, utilizes the SSD cache cluster to realize copy caching, thereby simplifying maintenance of data consistency.
The data heat aware multi-layer memory structure provided in fig. 9 includes: distributed cache system 910, memory cache cluster 920, programmable switch cluster 930, cache controller 940, client 950. Wherein in one embodiment, data is persisted in the HDD storage clusters 911, distributed storage clusters employing distributed file system HDFS; wherein the HDD storage cluster 911 has 4 nodes and the SSD cache cluster 912 has 3 nodes; wherein the memory cache cluster 920 has 3 nodes; the programmable switch cluster has 3 nodes, which are directly connected with the memory cache nodes respectively.
For a clearer description of an embodiment of the present application, the following embodiment provides a caching process of data, where data a is cold data, data B is cold data, data C is hottest data, data D is hot data, data E is associated data of C, data F is cold data, and data G is associated data of D.
Fig. 12 is a schematic diagram of an alternative structure of a data heat aware multi-layer storage structure according to an embodiment of the present application, where 1211 to 1213 are programmable switches, 1221 to 1223 are memory cache nodes, 1231 to 1233 are SSD cache nodes, 1241 to 1244 are HDD storage nodes, and data are stored in different locations according to the heat of the data.
Wherein three copies of cold data A, B, F, H are stored in HDD storage clusters of the distributed cache system. Two copies of hot data C, D and associated data E, G are stored in the HDD storage clusters of the distributed cache system, and one copy of hot data C, D and associated data E, G is stored in the SSD storage clusters of the distributed cache system; the memory cache cluster caches the hot data C, D; the programmable switch 1211, coupled to the memory caching node 1221, caches the hottest data C.
It should be noted that, the client sends a query request to the SSD cache cluster through a consistent hash algorithm, and the hash function used by the SSD cache cluster is h2. The consistent hash algorithm is implemented by a consistent hash ring, the starting point of the hash ring is 0, the end point is 2-32-1, and the starting point is connected with the end point. Fig. 13 is a data query access diagram of an SSD cache cluster according to an embodiment of the application. The locations of the SSD cache nodes and the cache data on the hash ring are shown in FIG. 13.
In some embodiments, the SSD cache nodes 1231 to 1233, the location determination flow of data C, D, E, G is shown in fig. 14.
S1401, calculating the hash value of the IP addresses of the SSD cache nodes 1231 to 1233 using a hash function, and determining the position of the node on the hash ring after leaving the hash value by 2≡32.
In this embodiment, the IP address of the SSD cache node 1231 is 10.128.226.24, the IP address of the SSD cache node 1232 is 10.128.226.25, and the IP address of the SSD cache node 1233 is 10.128.226.26. Calculating hash values of the IP addresses of SSD cache nodes 1231 through 1233 using a hash function h2, and after the hash values are compared, determining the positions of the nodes on the hash ring, namely
S1402, calculating the hash value of the data C, D, E, G by using the same hash function, and determining the position of the data on the hash ring after the hash value is 2≡32.
In some embodiments, the locations where data C, D, E, G can be obtained are:
s1403, the closest SSD cache nodes to the data C, D, E, G are searched clockwise on the hash ring, and the data C, D, E, G is stored in the closest SSD cache nodes.
In some embodiments, the SSD cache nodes closest to data C, D, E, G are each looked up clockwise on the hash ring, and data C, D, E, G is stored to the closest SSD cache node. Data D and E are stored to SSD cache node 1231; data C is stored to SSD cache node 1232; data G is stored to SSD cache node 1233.
It should be noted that, the memory cache cluster also adopts a consistent hash algorithm, and the determination of the data storage position is consistent with the process of determining the data storage position in the SSD cache cluster, but the hash function used by the memory cache cluster is h1, which is different from the hash function h2 used by the SSD cache cluster. Specifically, the locations of the memory cache nodes and the cache data on the hash ring are shown in fig. 15.
Memory cache node 1221 is located at hash ring 2048, memory cache node 1222 is located at hash ring 98304, memory cache node 1223 is located at hash ring 786432, data C is stored in memory cache node 1221, and data D is stored in memory cache node 1222.
In another embodiment of the present application, a data-hot aware copy lifting migration mechanism is provided, wherein the distributed cache system is divided into an HDD storage cluster and an SSD cache cluster, data is stored by three copies, two of which are persistently stored in the HDD, and the other data copy is dynamically migrated between the HDD and the SSD according to data hot.
Specifically, fig. 16 is a schematic flow chart of a copy migration mechanism of data heat perception in the embodiment of the present application, and specifically describes a workflow taking a temperature rising migration process of data C as an example. Wherein, the initialized state of the system is as shown in fig. 17, the data A, C, E is cold data, and three copies are stored in the HDD storage cluster; the data D is hot data, two copies are stored in the HDD storage cluster, and one copy is stored in the SSD cache cluster; data G is associated data of D, two copies are stored in the HDD storage cluster, and one copy is stored in the SSD cache cluster. As shown in fig. 16, the step of temperature rising migration of data C specifically includes:
s1601, three copies of data C in an initial state are stored in the HDD storage cluster, and when the data C is accessed frequently, one copy of the data C is migrated to the SSD cache cluster.
As shown in fig. 17, three copies of data C in the initial state are stored in the HDD storage cluster;
as shown in fig. 18, when the data C is frequently accessed, the control node of the distributed cache system uses the hash function h2 to obtain that the cache node of the data C in the SSD cache cluster is the SSD cache node 2, and to migrate one copy of the data C into the SSD cache node 2;
s1602, the buffer controller periodically monitors the heat of the data and calculates the data relevance;
s1603, when the cache controller monitors that the data C has the associated data E, the cache controller informs a control node of the distributed cache system to promote one copy of the associated data E of the hot data C;
s1604, the control node of the distributed cache system calculates the transition destination node of the data E as the SSD cache node 1 using the consistent hash function h 2.
The control node of the distributed cache system calculates a rising target node of the data E as an SSD cache node 1 by using a consistent hash function h 2. After obtaining the destination SSD cache node, the control node of the distributed cache system makes a copy of the data E to be migrated to the destination SSD cache node, and the migration result is shown in fig. 19.
In some embodiments, the cooling migration process of the data is similar to the steps of the above embodiments, and will not be described herein.
In some embodiments, query requests for data are handled by different clusters, depending on how hot the client requests the data. Comprising the following steps: query request flow for hottest data HT; a query request flow for hot data HD; query request flow for the hot data associated data HR; query request flow for cold data CD.
Specifically, fig. 20 is a schematic diagram of a query flow for the hottest data C in fig. 12 according to an embodiment of the present application, where the steps are as follows:
s2001, the client calculates a memory cache node where the data C is located by using a hash function h 1;
the client calculates a memory cache node where the data C is located according to a hash function h1 used by the memory cache cluster, where the data C is cached in the memory cache node 1221 in this embodiment;
s2002, the client sends a request for inquiring data C to the memory cache node;
wherein, the client sends a request for querying the data C to the memory cache node 1221 obtained in S2001;
s2003, the caching function of the programmable switch is started, the cache hits, and the switch returns data;
after receiving the request for querying the data C, the programmable switch 1211 compares the target data Key carried in the query request with the data Key cached by the programmable switch, where the two are consistent, and the programmable switch returns the cached data C to the client.
Fig. 21 is a schematic flow chart of access to the thermal data D in fig. 12 according to an embodiment of the present application, and the steps are as follows:
s2101, a client calculates a memory cache node where data D is located by using a hash function h 1;
the client calculates a memory cache node where the data D is located according to a hash function h1 used by the memory cache cluster, where the data D is cached in the memory cache node 1222 in this embodiment;
s2102, a client data access layer sends a request for inquiring data D to a memory cache node;
the client sends a request for querying the data D to the memory cache node 1222 obtained in S2101;
s2103, the programmable switch is not opened in the cache or is opened but not hit in the cache, and the request is forwarded to the memory cache node;
wherein, the programmable switch 1212 has the cache function not started or the cache function is started but the Key of the data D carried in the query request fails to match with the cached data Key, and forwards the request of the query D to the memory cache node 1222;
s2104, memory cache hits, and data is returned;
wherein the memory cache node 1222 caches the hit, returning data D to the client.
Fig. 22 is a schematic diagram of a query flow for the thermal data related data G in fig. 12 according to an embodiment of the present application, where the steps are as follows:
S2201, a client data access layer uses a hash function h1 to calculate a memory cache node where data G is located;
the client calculates a memory cache node where the data G is located according to a hash function h1 used by the memory cache cluster, wherein the data G does not appear in the memory cache cluster in the embodiment;
s2202, the client sends a request for inquiring data G to the memory cache node;
the client sends a request for inquiring the data G to the memory cache node obtained in the step S2201;
s2203, the exchanger is not opened in the cache or is opened but not hit in the cache, and the request is forwarded to the memory cache node;
the query request reaches the programmable switch, the cache function of the programmable switch is not started or the Key of the data G carried in the query request is failed to be matched with the cached data Key, and the query request is forwarded to the memory cache node;
s2204, the memory cache is not hit, and the client is notified of query failure;
in this embodiment, the data G does not appear in the memory cache cluster, and the client is notified of the query failure message due to the memory cache miss;
s2205, the client calculates an SSD cache node where the data G is located by using a hash function h 2;
after receiving the query failure message from the memory caching node, the client calculates an SSD caching node where the data G is located according to a hash function h2 used by the SSD caching cluster, where in this embodiment, the data G is cached in the SSD caching node 1233;
S2206, the client sends a request for inquiring the data G to the SSD cache node;
the client sends a request for querying the data G to the SSD cache node 1233 obtained in S2205;
s2207, caching hit by the SSD cache node, and returning data;
wherein SSD cache node 1233 caches the hit, returning data G to the client.
Fig. 23 is a flow chart of query for the cold data a in fig. 12 according to an embodiment of the present application, which includes the following steps:
s2301, the client calculates a memory cache node where the data A is located by using a hash function h 1;
the client calculates a memory cache node where the data A is located according to a hash function h1 used by the memory cache cluster, wherein the data A does not appear in the memory cache cluster in the embodiment;
s2302, the client sends a request for inquiring the data A to the memory cache node;
the client sends a query request of the data A to the memory cache node obtained in the step S2301;
s2303, the exchanger is not opened in the cache or is opened but not hit in the cache, and the request is forwarded to the memory cache node;
when the query request reaches the programmable switch, the cache function of the programmable switch is not started or the Key of the data A carried in the query request is failed to be matched with the cached data Key, and the query request is forwarded to the memory cache node;
S2304, notifying the client of query failure when the memory cache is not hit;
in this embodiment, the data a does not appear in the memory cache cluster, and the client is notified of the query failure message due to the memory cache miss;
s2305, the client data access layer uses a hash function h2 to calculate SSD cache nodes where the data A is located;
after receiving the query failure message from the memory cache node, the client calculates an SSD cache node where the data a is located according to a hash function h2 used by the SSD cache cluster, where the data a is not present in the SSD cache cluster in this embodiment;
s2306, the client sends a request for inquiring the data A to the SSD cache node;
the client sends a request for inquiring the data A to the SSD cache node obtained in the step S2305;
s2307, the SSD cache node caches the miss, and informs the client of the query failure;
in this embodiment, data a does not appear in the SSD cache cluster, and the SSD cache node caches the miss, notifying the client of the query failure message;
s2308, the client sends a query request of the data A to a control node of the distributed cache system;
after receiving the query failure message from the SSD cache node, the client sends a query request of the data A to a control node of the distributed cache system;
S2309, the control node of the distributed cache system returns the position of the data storage node where the data A is located to the client;
after receiving a query request from a client, a control node of the distributed cache system returns the position of a data storage node where data A is located to the client according to a storage mechanism of the distributed cache system;
s2310, the client initiates a query request to the data storage node to obtain data A;
the client initiates a query request to the data storage node according to the data storage node information received in S2309, so as to obtain data a.
The embodiment of the application provides a data updating method. Comprising the following steps: after receiving an update request from a client, a control node of the distributed cache system firstly writes an update operation into a log system of the distributed cache system; then notifying a cache controller to disable the cache of the data in the programmable switch and updating the data in the memory cache cluster; and finally, updating the data in the distributed cache system according to the log.
Specifically, fig. 24 is a schematic flow chart of updating the data C in fig. 12 according to an embodiment of the present application, and as shown in fig. 24, the steps are as follows:
S2401, a client sends an update request for data C to a distributed cache system;
the client sends an update request for the data C to a distributed cache system control node;
s2402, writing the update operation into a log by the distributed cache system;
the control node of the distributed cache system writes the update operation into a log;
s2403, the distributed cache system control node informs a cache controller, and the cache controller judges that the data C is in the cache of the programmable switch and informs the programmable switch to invalidate the data C;
wherein the distributed cache system control node informs the cache controller, the cache controller determines that the data C is in the cache of the programmable switch 1211, and informs the programmable switch 1211 to invalidate the data C;
s2404, the distributed cache system control node writes the updated data C into the memory cache node according to a hash function h1 used by the memory cache cluster;
the distributed cache system control node writes the updated data C into the memory cache node 1221 according to a hash function h1 used by the memory cache cluster;
s2405, updating the SSD cache cluster and the data C in the HDD storage cluster by the distributed cache system according to the log;
Wherein the distributed cache system updates data C in the SSD cache node 1232, the HDD storage node 1241, and the HDD storage node 1242 according to the log.
The data updating method of the multi-layer storage structure with the data heat perception provided by the embodiment of the application only needs to change the data in the memory cache cluster, does not need to maintain the data of the whole system, and ensures the consistency of the data; the delayed updating of data avoids maintenance of multiple copies and reduces the overhead of the storage server to ensure data consistency.
Based on the above embodiments, the present application has the following advantages:
(1) The application provides a hierarchical storage mechanism for data heat perception. Data is divided into four categories according to data heat: cold data, hot data-related data, and hottest data. The storage structure comprises a distributed cache system, a memory cache cluster and a programmable switch cache. The distributed cache system consists of data copy persistent storage of the mechanical hard disk and cache of the data copy solid state hard disk. The data storage mode, storage location, addressing pattern will vary with the data warmth.
(2) The hierarchical storage mechanism for data heat perception is a general data storage scheme. The storage object may be a file in a distributed file system GFS, HDFS, etc., may be a key value pair in a distributed key value storage cluster Dynamo, etc., may be a column group in a distributed column-oriented storage cluster bigtmable/Cassandra, etc., and may be a document in a distributed document storage cluster MongoDB, etc. The differences in data are masked by hashing in the distributed cache system.
(3) A method for caching data copies is provided. By taking the data copy migrated to the SSD cache cluster as the cache of the hot data and the associated data, the client directly accesses the data object cached in the SSD through the hash function, so that the response speed of data query is increased, and the overall cache hit rate of the data is improved.
(4) A caching mechanism for hot data associated data based on data hot sensing is provided. It is neither practical nor necessary to cache all data in memory. But if only hot data is cached in the cache, a cache miss data query must occur. Based on the relativity of inquiry among data, the application provides a caching mechanism of hot data related data based on data hot perception, one copy of the related data of a distributed caching system is promoted to an SSD cache cluster, and the data copy in the SSD cache cluster is used as cache for inquiry of a client.
(5) A hot data memory+solid state dual layer cache mechanism is presented. The hot data is independently organized in memory and solid state to manage the cache, and is independently addressed, thus having different purposes and functions. Under the normal running state of the system, the cache in the memory responds to the access request of the client; when the memory caching node fails and can not provide service, the SSD cache responds to the client request and assists the memory caching node to complete fault recovery. In contrast to classical distributed cache systems, such as Redis/memcached, although the present system is a double-layer cache, there is no overhead to increase the consistency of multiple copies of the cached data.
(6) The proposed warmest data caching mechanism based on programmable switches. In order to avoid network congestion and packet loss caused by a large number of concurrent requests for hot data in a memory cache, the application provides a switch cache mechanism of the hottest data. By monitoring the congestion status of the memory caching node network, the caching function of the programmable switch is enabled when congestion occurs, and the hottest data is cached in the programmable switch. The programmable switch responds to the request packet by matching the user query target data with the cache data.
(7) A copy lifting migration mechanism for data heat perception is provided. The application simplifies the maintenance of the data consistency of the distributed system by lifting and transferring the data copy instead of copying.
Meanwhile, the application has the following commercial value:
(1) Because the application is a universal layered storage mechanism, the application is compatible with the original specific mode and architecture of the distributed cache system of enterprises, for example, the original system can be a distributed file system GFS, HDFS can be a distributed key value storage cluster Dynamo and the like, can be a distributed column-oriented storage cluster Bigtable/Cassandra and the like, and can be a distributed document storage cluster MongoDB and the like. The application range is wide.
(2) According to the application, on the premise of not changing the consistency of the distributed cache system, the cache hit probability of hot data and associated data is improved through data copy migration. Low cost and great benefit.
(3) The memory caching node congestion and packet loss caused by the hottest data in the memory caching node are realized by the programmable switch according to the network request quantity self-adaptive enabling caching function of the memory caching node. Currently, programmable switches and intelligent switches enter the market, so that the application has better feasibility.
(4) The thermal data associated data provided by the application has better flexibility. And the cost of the SSD hard disk is high, a small-scale SCC cluster can be deployed, and data with high association degree is cached. With the development of solid state disk technology, the performance is continuously improved, the cost is continuously reduced, the SCC cluster can be continuously increased, and more associated data can be cached.
Based on the foregoing embodiments, the embodiments of the present application provide a data storage device and a data query device, where the device includes units included, and modules included in the units may be implemented by a processor in a computer device; of course, the method can also be realized by a specific logic circuit; in practice, the processor may be a central processing unit (Central Processing Unit, CPU), microprocessor (Microprocessor Unit, MPU), digital signal processor (Digital Signal Processor, DSP) or field programmable gate array (Field Programmable Gate Array, FPGA), etc.
Fig. 25 is a schematic diagram of a composition structure of a data storage device according to an embodiment of the present application, and as shown in fig. 25, a data storage device 2500 includes: an acquisition module 2510, a migration module 2520, wherein:
an acquiring module 2510, configured to acquire a data heat of target data in the distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster;
a migration module 2520 configured to dynamically migrate, based on the data warmth, a target data copy of the plurality of data copies between the first cache cluster and the storage cluster; other data copies of the plurality of data copies are persisted to the storage cluster.
In some embodiments, the migration module 2520 is further configured to: and under the condition that the target data copy is stored in the storage cluster and the data heat exceeds a first threshold, migrating the target data copy from the storage cluster to the first cache cluster.
In some embodiments, the first cache cluster includes a plurality of first cache nodes; the migration module 2520 is further configured to: determining a first hash value corresponding to the target data based on a first hash function; determining a first target cache node among the plurality of first cache nodes based on the first hash value; and migrating the target data copy from the storage cluster to the first target cache node.
In some embodiments, the migration module 2520 is further configured to: determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node; and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
In some embodiments, the migration module 2520 is further configured to: and under the condition that the target data copy is stored in the first cache cluster and the data heat is smaller than a second threshold, migrating the target data copy from the first cache cluster to the storage cluster.
In some embodiments, the storage cluster includes a plurality of storage nodes; the migration module 2520 is further configured to: determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster itself; and migrating the target data copy from the first cache cluster to the target storage node.
In some embodiments, the migration module 2520 is further configured to: and under the condition that the data heat of the target data exceeds a first threshold value, migrating the data copy of the associated data of the target data from the storage cluster to the first cache cluster.
In some embodiments, the migration module 2520 is further configured to: and caching the target data to a second cache cluster in response to an access event for the target data.
In some embodiments, the second cache cluster includes a plurality of second cache nodes; the migration module 2520 is further configured to: determining a second hash value corresponding to the target data based on a second hash function; determining a second target cache node among the plurality of second cache nodes based on the second hash value; and caching the target data to the second target cache node.
In some embodiments, the migration module 2520 is further configured to: determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node; and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
In some embodiments, the migration module 2520 is further configured to: responding to an overload event of the second target cache node, and starting a cache function of a target programmable switch connected with the second target cache node; and caching the target data to the target programmable switch.
In some embodiments, the data storage device 2500 further comprises: a data updating module;
the data updating module is used for responding to a data updating request aiming at the target data and writing data updating operation into a log; the data updating request carries updated target data; invalidating target data in the target programmable switch and updating target data in the second target cache node based on the updated target data; and updating a plurality of data copies stored in the distributed cache system by the target data based on the log.
Fig. 26 is a schematic structural diagram of a data query device according to an embodiment of the present application, as shown in fig. 26, a data query device 2600 includes: a query module 2610, wherein:
a query module 2610, configured to send a first query request to a first cache cluster in the distributed cache system in response to a query event for target data, and receive a first query feedback;
the query module 2610 is further configured to send a second query request to a storage cluster in the distributed cache system and receive a second query feedback if the first query feedback characterizes that the target data does not exist in the first cache cluster;
The target data is stored with a plurality of data copies in the distributed cache system, the target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on the heat of the target data, and other data copies in the plurality of data copies are permanently stored in the storage cluster.
In some embodiments, the first cache cluster includes a plurality of first cache nodes; the query module 2610 is further configured to: determining a first hash value corresponding to the target data based on a first hash function; determining a first target cache node among the plurality of first cache nodes based on the first hash value; and sending the first query request to the first target cache node.
In some embodiments, the query module 2610 is further configured to: determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node; and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
In some embodiments, the storage cluster includes a plurality of storage nodes; the query module 2610 is further configured to: determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster itself; and sending the second query request to the target storage node.
In some embodiments, the query module 2610 is further configured to send a third query request to the second cache cluster and receive a third query feedback in response to a query event for the target data; and sending a first query request to a first cache cluster in the distributed cache system under the condition that the third query feedback characterizes that the target data does not exist in the second cache cluster.
In some embodiments, the second cache cluster includes a plurality of second cache nodes; the query module 2610 is further configured to determine a second hash value corresponding to the target data based on a second hash function; determining a second target cache node among the plurality of second cache nodes based on the second hash value; and sending a third query request to the second target cache node.
In some embodiments, the query module 2610 is further configured to determine a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node; and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
In some embodiments, the query module 2610 is further configured to send the third query request to a target programmable switch connected to the second target cache node; the third query request is used for indicating the target programmable switch to query the target data in a switch cache, and forwarding the third query request to the second target cache node when the target data does not exist in the switch cache; and receiving the third query feedback sent by the second target cache node.
In some embodiments, the third query request is further configured to instruct the target programmable switch to query the switch cache for the target data, and send a third query feedback carrying the target data if the target data exists in the switch cache; the query module 2610 is further configured to receive a third query feedback carrying the target data sent by the target programmable switch.
The description of the apparatus embodiments above is similar to that of the method embodiments above, with similar advantageous effects as the method embodiments. In some embodiments, the functions or modules included in the apparatus provided by the embodiments of the present application may be used to perform the methods described in the foregoing method embodiments, and for technical details that are not disclosed in the embodiments of the apparatus of the present application, reference should be made to the description of the embodiments of the method of the present application.
It should be noted that, in the embodiment of the present application, if the data storage and the data query method described above are implemented in the form of software functional modules, and sold or used as independent products, they may also be stored in a computer readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or some of contributing to the related art may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, an optical disk, or other various media capable of storing program codes. Thus, embodiments of the application are not limited to any specific hardware, software, or firmware, or any combination of hardware, software, and firmware.
The embodiment of the application provides a computer device, which comprises a memory and a processor, wherein the memory stores a computer program capable of running on the processor, and the processor realizes part or all of the steps in the method when executing the program.
Embodiments of the present application provide a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs some or all of the steps of the above-described method. The computer readable storage medium may be transitory or non-transitory.
Embodiments of the present application provide a computer program comprising computer readable code which, when run in a computer device, causes a processor in the computer device to perform some or all of the steps for carrying out the above method.
Embodiments of the present application provide a computer program product comprising a non-transitory computer-readable storage medium storing a computer program which, when read and executed by a computer, performs some or all of the steps of the above-described method. The computer program product may be realized in particular by means of hardware, software or a combination thereof. In some embodiments, the computer program product is embodied as a computer storage medium, in other embodiments the computer program product is embodied as a software product, such as a software development kit (Software Development Kit, SDK), or the like.
It should be noted here that: the above description of various embodiments is intended to emphasize the differences between the various embodiments, the same or similar features being referred to each other. The above description of apparatus, storage medium, computer program and computer program product embodiments is similar to that of method embodiments described above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, the storage medium, the computer program and the computer program product of the present application, reference should be made to the description of the embodiments of the method of the present application.
Fig. 27 is a schematic diagram of a hardware entity of a computer device according to an embodiment of the present application, as shown in fig. 27, the hardware entity of the computer device 2700 includes: a processor 2701 and a memory 2702, wherein the memory 2702 stores a computer program executable on the processor 2701, the processor 2701 implementing the steps in the method of any of the embodiments described above when the program is executed.
The memory 2702 stores a computer program executable on a processor, and the memory 2702 is configured to store instructions and applications executable by the processor 2701, and may also cache data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by each module in the processor 2701 and the computer device 2700, which may be implemented by a FLASH memory (FLASH) or a random access memory (Random Access Memory, RAM).
The processor 2701 performs the steps of any of the data storage and data query methods described above when executing a program. The processor 2701 generally controls the overall operation of the computer device 2700.
Embodiments of the present application provide a computer storage medium storing one or more programs executable by one or more processors to implement the steps of the data storage and data query method of any of the embodiments above.
It should be noted here that: the description of the storage medium and apparatus embodiments above is similar to that of the method embodiments described above, with similar benefits as the method embodiments. For technical details not disclosed in the embodiments of the storage medium and the apparatus of the present application, please refer to the description of the method embodiments of the present application.
The processor may be at least one of a target application integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), a digital signal processing device (Digital Signal Processing Device, DSPD), a programmable logic device (Programmable Logic Device, PLD), a field programmable gate array (Field Programmable Gate Array, FPGA), a central processing unit (CentralProcessing Unit, CPU), a controller, a microcontroller, and a microprocessor. It will be appreciated that the electronic device implementing the above-mentioned processor function may be other, and embodiments of the present application are not limited in detail.
The computer storage medium/Memory may be a Read Only Memory (ROM), a programmable Read Only Memory (Programmable Read-Only Memory, PROM), an erasable programmable Read Only Memory (Erasable Programmable Read-Only Memory, EPROM), an electrically erasable programmable Read Only Memory (Electrically Erasable Programmable Read-OnlyMemory, EEPROM), a magnetic random access Memory (Ferromagnetic Random Access Memory, FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical disk, or a compact disk Read Only Memory (Compact Disc Read-Only Memory, CD-ROM), etc.; but may also be various terminals such as mobile phones, computers, tablet devices, personal digital assistants, etc., that include one or any combination of the above-mentioned memories.
It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in various embodiments of the present application, the sequence number of each step/process described above does not mean that the execution sequence of each step/process should be determined by its functions and inherent logic, and should not constitute any limitation on the implementation process of the embodiments of the present application. The foregoing embodiment numbers of the present application are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above described device embodiments are only illustrative, e.g. the division of the units is only one logical function division, and there may be other divisions in practice, such as: multiple units or components may be combined or may be integrated into another system, or some features may be omitted, or not performed. In addition, the various components shown or discussed may be coupled or directly coupled or communicatively coupled to each other via some interface, whether indirectly coupled or communicatively coupled to devices or units, whether electrically, mechanically, or otherwise.
The units described above as separate components may or may not be physically separate, and components shown as units may or may not be physical units; can be located in one place or distributed to a plurality of network units; some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
In addition, each functional unit in each embodiment of the present application may be integrated in one processing unit, or each unit may be separately used as one unit, or two or more units may be integrated in one unit; the integrated units may be implemented in hardware or in hardware plus software functional units.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware related to program instructions, and the foregoing program may be stored in a computer readable storage medium, where the program, when executed, performs steps including the above method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read Only Memory (ROM), a magnetic disk or an optical disk, or the like, which can store program codes.
Alternatively, the above-described integrated units of the present application may be stored in a computer-readable storage medium if implemented in the form of software functional modules and sold or used as separate products. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the related art in the form of a software product stored in a storage medium, including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a removable storage device, a ROM, a magnetic disk, or an optical disk.
The foregoing is merely an embodiment of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes and substitutions are intended to be covered by the scope of the present application.

Claims (25)

1. A method of data storage, the method comprising:
acquiring the data heat of target data in a distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster;
Dynamically migrating a target data copy of the plurality of data copies between the first cache cluster and the storage cluster based on the data warmth; other data copies of the plurality of data copies are persisted to the storage cluster;
the consistency between the target data copy before and after dynamic migration and the other data copies is maintained through a multi-copy consistency maintenance mechanism of the distributed cache system, and the number of the data copies of the target data before and after dynamic migration is unchanged.
2. The method of claim 1, wherein dynamically migrating a target data copy of the plurality of data copies between the first cache cluster and the storage cluster based on the data warmth comprises:
and under the condition that the target data copy is stored in the storage cluster and the data heat exceeds a first threshold, migrating the target data copy from the storage cluster to the first cache cluster.
3. The method of claim 2, wherein the first cache cluster comprises a plurality of first cache nodes; the migration of the target data copy from the storage cluster to the first cache cluster includes:
Determining a first hash value corresponding to the target data based on a first hash function;
determining a first target cache node among the plurality of first cache nodes based on the first hash value;
and migrating the target data copy from the storage cluster to the first target cache node.
4. The method of claim 3, wherein the determining a first target cache node among the plurality of first cache nodes based on the first hash value comprises:
determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node;
and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
5. The method of claim 1, wherein dynamically migrating a target data copy of the plurality of data copies between the first cache cluster and the storage cluster based on the data warmth comprises:
and under the condition that the target data copy is stored in the first cache cluster and the data heat is smaller than a second threshold, migrating the target data copy from the first cache cluster to the storage cluster.
6. The method of claim 5, wherein the storage cluster comprises a plurality of storage nodes; the migration of the target data copy from the first cache cluster to the storage cluster includes:
determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster itself;
and migrating the target data copy from the first cache cluster to the target storage node.
7. The method according to any one of claims 1 to 6, further comprising:
and under the condition that the data heat of the target data exceeds a first threshold value, migrating the data copy of the associated data of the target data from the storage cluster to the first cache cluster.
8. The method according to any one of claims 1 to 6, further comprising:
and caching the target data to a second cache cluster in response to an access event for the target data.
9. The method of claim 8, wherein the second cache cluster comprises a plurality of second cache nodes; the caching the target data to a second cache cluster includes:
Determining a second hash value corresponding to the target data based on a second hash function;
determining a second target cache node among the plurality of second cache nodes based on the second hash value;
and caching the target data to the second target cache node.
10. The method of claim 9, wherein the determining a second target cache node among the plurality of second cache nodes based on the second hash value comprises:
determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node;
and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
11. The method according to claim 9, wherein the method further comprises:
responding to an overload event of the second target cache node, and starting a cache function of a target programmable switch connected with the second target cache node;
and caching the target data to the target programmable switch.
12. The method of claim 11, wherein the method further comprises:
Writing a data update operation to a log in response to a data update request for the target data; the data updating request carries updated target data;
invalidating target data in the target programmable switch and updating target data in the second target cache node based on the updated target data;
and updating a plurality of data copies stored in the distributed cache system by the target data based on the log.
13. A method of querying data, the method comprising:
responding to a query event aiming at target data, sending a first query request to a first cache cluster in a distributed cache system, and receiving first query feedback;
sending a second query request to a storage cluster in the distributed cache system and receiving second query feedback under the condition that the first query feedback characterizes that the target data does not exist in the first cache cluster;
the target data is stored with a plurality of data copies in the distributed cache system, the target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on the heat of the target data, and other data copies in the plurality of data copies are permanently stored in the storage cluster; the consistency between the target data copy before and after the dynamic migration and the other data copies is maintained by a multi-copy consistency maintenance mechanism of the distributed cache system, and the number of the data copies of the target data before and after the dynamic migration is unchanged.
14. The method of claim 13, wherein the first cache cluster comprises a plurality of first cache nodes; the sending a first query request to a first cache cluster in the distributed cache system includes:
determining a first hash value corresponding to the target data based on a first hash function;
determining a first target cache node among the plurality of first cache nodes based on the first hash value;
and sending the first query request to the first target cache node.
15. The method of claim 14, wherein the determining a first target cache node among the plurality of first cache nodes based on the first hash value comprises:
determining a first data position of the target data in a first hash ring based on the first hash value; the first hash ring comprises a management area corresponding to each first cache node;
and taking the first cache node corresponding to the management area in which the first data position falls as the first target cache node.
16. The method of claim 13, wherein the storage cluster comprises a plurality of storage nodes; the sending a second query request to a storage cluster in the distributed cache system includes:
Determining a target storage node in the plurality of storage nodes based on a storage mechanism of the storage cluster itself;
and sending the second query request to the target storage node.
17. The method of any of claims 13 to 16, wherein the sending a first query request to a first cache cluster in the distributed cache system in response to a query event for target data comprises:
responding to a query event aiming at target data, sending a third query request to a second cache cluster, and receiving third query feedback;
and sending a first query request to a first cache cluster in the distributed cache system under the condition that the third query feedback characterizes that the target data does not exist in the second cache cluster.
18. The method of claim 17, wherein the second cache cluster comprises a plurality of second cache nodes; the sending a third query request to the second cache cluster includes:
determining a second hash value corresponding to the target data based on a second hash function;
determining a second target cache node among the plurality of second cache nodes based on the second hash value;
And sending a third query request to the second target cache node.
19. The method of claim 18, wherein the determining a second target cache node among the plurality of second cache nodes based on the second hash value comprises:
determining a second data location of the target data in a second hash ring based on the second hash value; the second hash ring comprises a management area corresponding to each second cache node;
and taking the second cache node corresponding to the management area in which the second data position falls as the second target cache node.
20. The method of claim 18, wherein the sending a third query request to the second target cache node and receiving third query feedback comprises:
sending the third query request to a target programmable switch connected with the second target cache node; the third query request is used for indicating the target programmable switch to query the target data in a switch cache, and forwarding the third query request to the second target cache node when the target data does not exist in the switch cache;
And receiving the third query feedback sent by the second target cache node.
21. The method of claim 20, wherein the third query request is further configured to instruct the target programmable switch to query the switch cache for the target data, and to send a third query feedback carrying the target data if the target data is present in the switch cache;
the receiving third query feedback includes: and receiving third query feedback carrying the target data, which is sent by the target programmable switch.
22. A multi-tiered storage system comprising a cache controller and a distributed cache system comprising a storage cluster and a first cache cluster, wherein:
the distributed cache system is used for storing a plurality of data copies of target data;
the cache controller is used for controlling target data copies in the plurality of data copies to dynamically migrate between the first cache cluster and the storage cluster based on the data heat of the target data, and other data copies in the plurality of data copies are stored in the storage cluster in a lasting manner;
The consistency between the target data copy before and after dynamic migration and the other data copies is maintained through a multi-copy consistency maintenance mechanism of the distributed cache system, and the number of the data copies of the target data before and after dynamic migration is unchanged.
23. A data storage device, comprising:
the acquisition module is used for acquiring the data heat of the target data in the distributed cache system; the target data is stored with a plurality of data copies in the distributed cache system, and the distributed cache system comprises a storage cluster and a first cache cluster;
a migration module, configured to dynamically migrate, based on the data hotness, a target data copy of the plurality of data copies between the first cache cluster and the storage cluster; other data copies of the plurality of data copies are persisted to the storage cluster;
the consistency between the target data copy before and after dynamic migration and the other data copies is maintained through a multi-copy consistency maintenance mechanism of the distributed cache system, and the number of the data copies of the target data before and after dynamic migration is unchanged.
24. A data query device, comprising:
the query module is used for responding to a query event aiming at target data, sending a first query request to a first cache cluster in the distributed cache system and receiving first query feedback;
the query module is further configured to send a second query request to a storage cluster in the distributed cache system and receive a second query feedback when the first query feedback characterizes that the target data does not exist in the first cache cluster;
the target data is stored with a plurality of data copies in the distributed cache system, the target data copies in the plurality of data copies are dynamically migrated between the first cache cluster and the storage cluster based on the heat of the target data, and other data copies in the plurality of data copies are permanently stored in the storage cluster; the consistency between the target data copy before and after the dynamic migration and the other data copies is maintained by a multi-copy consistency maintenance mechanism of the distributed cache system, and the number of the data copies of the target data before and after the dynamic migration is unchanged.
25. A computer readable storage medium having stored thereon a computer program, which when executed by a processor performs the steps of the method of any of claims 1 to 12 or the steps of the method of any of claims 13 to 21.
CN202310143690.5A 2023-02-21 2023-02-21 Data storage and data query method, device, equipment and storage medium Active CN115878513B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310143690.5A CN115878513B (en) 2023-02-21 2023-02-21 Data storage and data query method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310143690.5A CN115878513B (en) 2023-02-21 2023-02-21 Data storage and data query method, device, equipment and storage medium

Publications (2)

Publication Number Publication Date
CN115878513A CN115878513A (en) 2023-03-31
CN115878513B true CN115878513B (en) 2023-08-15

Family

ID=85761434

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310143690.5A Active CN115878513B (en) 2023-02-21 2023-02-21 Data storage and data query method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115878513B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117591039A (en) * 2024-01-18 2024-02-23 济南浪潮数据技术有限公司 Distributed storage method, system, equipment and medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364474A (en) * 2011-11-17 2012-02-29 中国科学院计算技术研究所 Metadata storage system for cluster file system and metadata management method
CN106941525A (en) * 2017-03-14 2017-07-11 郑州云海信息技术有限公司 A kind of method that data consistency is kept in distributed memory system
WO2017122922A1 (en) * 2016-01-11 2017-07-20 충북대학교 산학협력단 Load balancing system using data replication and data migration in distributed in-memory environment
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN109144895A (en) * 2017-06-15 2019-01-04 中兴通讯股份有限公司 A kind of date storage method and device
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN110677348A (en) * 2019-09-17 2020-01-10 阿里巴巴集团控股有限公司 Data distribution method, access method and respective devices based on cache cluster routing
CN112076464A (en) * 2020-09-04 2020-12-15 腾讯科技(深圳)有限公司 Data request processing method and device, computer equipment and storage medium
CN115442439A (en) * 2022-08-31 2022-12-06 云知声智能科技股份有限公司 Distributed cache cluster management method, system, terminal and storage medium

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102364474A (en) * 2011-11-17 2012-02-29 中国科学院计算技术研究所 Metadata storage system for cluster file system and metadata management method
WO2017122922A1 (en) * 2016-01-11 2017-07-20 충북대학교 산학협력단 Load balancing system using data replication and data migration in distributed in-memory environment
CN106941525A (en) * 2017-03-14 2017-07-11 郑州云海信息技术有限公司 A kind of method that data consistency is kept in distributed memory system
CN108810041A (en) * 2017-04-27 2018-11-13 华为技术有限公司 A kind of data write-in of distributed cache system and expansion method, device
CN109144895A (en) * 2017-06-15 2019-01-04 中兴通讯股份有限公司 A kind of date storage method and device
CN107844269A (en) * 2017-10-17 2018-03-27 华中科技大学 A kind of layering mixing storage system and method based on uniformity Hash
CN109726191A (en) * 2018-12-12 2019-05-07 中国联合网络通信集团有限公司 A kind of processing method and system across company-data, storage medium
CN110677348A (en) * 2019-09-17 2020-01-10 阿里巴巴集团控股有限公司 Data distribution method, access method and respective devices based on cache cluster routing
CN112076464A (en) * 2020-09-04 2020-12-15 腾讯科技(深圳)有限公司 Data request processing method and device, computer equipment and storage medium
CN115442439A (en) * 2022-08-31 2022-12-06 云知声智能科技股份有限公司 Distributed cache cluster management method, system, terminal and storage medium

Also Published As

Publication number Publication date
CN115878513A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
US7124169B2 (en) Network system and its switches
EP2062123B1 (en) Automatic load spreading in a clustered network storage system
CN115878513B (en) Data storage and data query method, device, equipment and storage medium
EP3131265B1 (en) Data prefetching method for distributed hash table dht storage system, node, and system
US20040030731A1 (en) System and method for accessing files in a network
US10168912B2 (en) Short stroking and data tiering for a distributed filesystem
US20030115420A1 (en) Methods and apparatus for implementing a chche replacement scheme
WO2010111875A1 (en) Data processing method, comprehensive data node, master node and system
US10127156B1 (en) Caching techniques
CN113901024A (en) Data storage system, data storage method, readable medium, and electronic device
JP3776496B2 (en) Data storage system
CN114844846A (en) Multi-level cache distributed key value storage system based on programmable switch
CN114817195A (en) Method, system, storage medium and equipment for managing distributed storage cache
JP5163171B2 (en) Cache system and server
Tiwari et al. Dynamic Web caching: For robustness, low latency & disconnection handling
CN112395453B (en) Self-adaptive distributed remote sensing image caching and searching method
CN102055795A (en) Distributed file system metadata management method
CN115328857A (en) File access method, device, client and storage medium
US10067877B1 (en) Method, apparatus and computer program product for use in managing multi-cache data storage systems
Cao et al. Data allocation of large-scale key-value store system using kinetic drives
CN115858409A (en) Data prefetching method, computing node and storage system
CN110659157A (en) Distributed multi-language retrieval platform and method for lossless recovery
JP2014203329A (en) Storage system, node device, and data management method
CN117539915B (en) Data processing method and related device
JP4514222B2 (en) Data storage system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant