CN116974467A - Data caching processing method, device and system - Google Patents

Data caching processing method, device and system Download PDF

Info

Publication number
CN116974467A
CN116974467A CN202310736341.4A CN202310736341A CN116974467A CN 116974467 A CN116974467 A CN 116974467A CN 202310736341 A CN202310736341 A CN 202310736341A CN 116974467 A CN116974467 A CN 116974467A
Authority
CN
China
Prior art keywords
data
target
query
module
target user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310736341.4A
Other languages
Chinese (zh)
Inventor
王淏舟
杨峻峰
赵园
郭罡
冯雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Tuoshupai Technology Development Co ltd
Original Assignee
Hangzhou Tuoshupai Technology Development Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Tuoshupai Technology Development Co ltd filed Critical Hangzhou Tuoshupai Technology Development Co ltd
Priority to CN202310736341.4A priority Critical patent/CN116974467A/en
Publication of CN116974467A publication Critical patent/CN116974467A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0655Vertical data movement, i.e. input-output transfer; data movement between one or more hosts and one or more storage devices
    • G06F3/0656Data buffering arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0647Migration mechanisms

Abstract

The application relates to a data caching method, device and system. The method comprises the following steps: the method comprises the steps of obtaining data to be cached, and caching the data to be cached as initial cold data to a preset data storage module; in the data query process, based on the detected data query times, retrieving current dynamic hot data from the initial cold data, and transferring the current dynamic hot data from the data storage module to a preset cache module; and responding to the current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data. By adopting the method, the data query efficiency of the database cluster can be increased, and the query speed of the user can be improved.

Description

Data caching processing method, device and system
Technical Field
The present application relates to the field of distributed data storage technologies, and in particular, to a data caching method, device and system.
Background
Along with the development of big data technology, companies and individuals cannot leave for storing and calling data, an existing cloud primary database is usually deployed on a public cloud environment, metadata information is stored in metadata service, user data is stored in public cloud object storage, and all data are transmitted and acquired through a network when a user inquires or calls; the metadata information refers to data used for describing/executing user data query and other operations in the database, and the metadata is key data of the database; the user data is data which is generated by a user and is imported into a database for storage, and the user data is data which is specified by data in the general sense, and consists of a plurality of files.
The method for acquiring the user data in the prior art comprises the following steps: the computing node pulls the metadata information according to the data query instruction sent by the user, processes the data query instruction, pulls the user data required by the user according to the metadata information, and returns the user data to the user. However, in the prior art, access to user data generally exists, all nodes need to access metadata service, so that the problem of overlarge data access load of the metadata service is caused, further, the metadata service nodes occupy a large amount of network bandwidth, network overhead and cost are increased, database performance is poor, and the overall use cost and maintenance cost of the system are high, so that the delay of user inquiry is improved.
At present, no effective solution has been proposed for the problem of low data query efficiency of database clusters.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a data caching method, device and system for solving the above technical problems.
In a first aspect, the present application provides a data cache processing method. The method comprises the following steps:
the method comprises the steps of obtaining data to be cached, and caching the data to be cached as initial cold data to a preset data storage module; in the data query process, based on the detected data query times, retrieving current dynamic hot data from the initial cold data, and transferring the current dynamic hot data from the data storage module to a preset cache module; and responding to the current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
In one embodiment, in response to a current query instruction, query is performed in current dynamic hot data, if the query fails, query is performed in a data storage module to obtain target user data, and the target user data is migrated from the data storage module to a cache module to obtain new dynamic hot data, including:
responding to the current query instruction, and acquiring a query frequency result corresponding to the target user data;
if the query frequency result of the detected target user data indicates that the query frequency of the target user data is greater than or equal to the preset query frequency threshold, the target user data is migrated from the data storage module to the cache module, and new dynamic thermal data is obtained.
In one embodiment, the target user data includes target user data information and target metadata information, the method further comprising:
and binding and storing the target user data information and the target metadata information in the cache module, wherein the target user data information and the target metadata information are in one-to-one correspondence.
In one embodiment, the target user data includes a target user data address and target data, wherein the target data includes the target user data information and the target metadata information; after the new current dynamic thermal data is obtained, or after the current dynamic thermal data is queried and queried successfully in response to the current query instruction, the method further comprises:
acquiring a target user data address based on the current query instruction;
and returning the target data in response to a target data query instruction obtained based on the target user data address, so as to realize the query of the target user data.
In one embodiment, the target user data address includes a target metadata address and a target data address, and the returning the target data in response to the target data query instruction obtained based on the target user data address, to implement the query on the target user data includes:
responding to a target data address inquiry instruction obtained based on the target metadata address, and returning the target data address, wherein the target metadata address and the target data address are in one-to-one correspondence;
responding to a query instruction based on the target data address, and returning the target data address;
and responding to the target data address, returning the target data, and realizing the query on the target user data.
In one embodiment, the buffer module includes a first buffer module and a second buffer module, and the current dynamic hot data is stored in the first buffer module, and the method further includes:
if the data quantity in the cache module is detected to be equal to a preset cache threshold value, calculating temperature information of dynamic hot data in the cache module based on the data query times;
if the current dynamic temperature data is retrieved from the current dynamic thermal data based on the temperature information, the current dynamic temperature data is stored in the second cache module, wherein the temperature information of the current dynamic temperature data is smaller than a preset first temperature threshold value;
and if the current dynamic cold data is retrieved from the current dynamic temperature data based on the temperature information, storing the current dynamic cold data into a data storage module, wherein the temperature information of the current dynamic cold data is smaller than a preset second temperature threshold value, and the second temperature threshold value is smaller than the first temperature threshold value.
In one embodiment, the cache module includes cache control information, calculates temperature information of dynamic hot data in the cache module based on the number of data queries, including:
and calculating the temperature information of the dynamic data based on the initial temperature, the data query times and the preset cooling curve corresponding to all the dynamic data recorded in the cache control information, wherein the initial temperature is obtained according to the preset initial temperature setting.
In one embodiment, after obtaining the data to be cached and caching the data to be cached as initial cold data to a preset data storage module, the method further includes:
acquiring a preset feature extraction algorithm;
searching the current dynamic hot data in the initial cold data based on a feature extraction algorithm;
and transferring the current dynamic hot data from the data storage module to the cache module to obtain new dynamic hot data.
In a second aspect, the application further provides a data cache processing device. The device comprises:
the acquisition module is used for acquiring data to be cached, and caching the data to be cached as initial cold data to the preset data storage module;
the query module is used for retrieving the current dynamic hot data from the initial cold data based on the detected data query times in the data query process and transferring the current dynamic hot data from the data storage module to a preset cache module;
the generating module is used for responding to the current query instruction, querying in the cache module, querying in the data storage module if the query fails to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
In a third aspect, the application further provides a data cache processing system. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:
the method comprises the steps of obtaining data to be cached, and caching the data to be cached as initial cold data to a preset data storage module;
in the data query process, based on the detected data query times, retrieving current dynamic hot data from the initial cold data, and transferring the current dynamic hot data from the data storage module to a preset cache module;
and responding to the current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
The data caching method, the data caching device and the data caching system are characterized in that firstly, the obtained data to be cached is used as initial cold data to be stored in a data storage module; secondly, according to the acquired data query times, retrieving current dynamic hot data from the initial cold data, and storing the current dynamic hot data into a cache module; and finally, inquiring in the cache module according to the current inquiring instruction, if the inquiring fails, inquiring in the data storage module to obtain target user data corresponding to the current inquiring instruction, and storing the target user data in the cache module to obtain new dynamic thermal data. By the method, the obtained large amount of data can be saved in a grading manner, so that the efficiency in searching is improved; furthermore, part of high-frequency data is prefetched and cached according to the instruction input by the user, so that the speed of user inquiry can be improved, the load of metadata service or object storage service is reduced, and the overall performance of the database cluster is improved.
Drawings
FIG. 1 is an application environment diagram of a data cache processing method in one embodiment;
FIG. 2 is a flow chart of a data caching method according to an embodiment;
FIG. 3 is a block diagram schematically illustrating a data cache processing method in one embodiment;
FIG. 4 is a schematic block diagram of a data caching method in another embodiment;
FIG. 5 is a block diagram schematically illustrating a data buffering method in a preferred embodiment;
FIG. 6 is a block diagram of a data cache processing apparatus in one embodiment;
FIG. 7 is a block diagram of a data cache processing system in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The data caching method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. Wherein the terminal 102 communicates with the server 104 via a network. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. Firstly caching data to be cached in a data storage module, retrieving current dynamic thermal data from the data storage module according to the data query times, storing the current dynamic thermal data in the cache module, finally responding to a current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and transferring the target user data from the data storage module to the cache module to obtain new dynamic thermal data. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a data caching method is provided, and the method is applied to the terminal in fig. 1 for illustration, it is to be understood that the method can also be applied to a server, and can also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. Fig. 2 is a flowchart of a data caching processing method according to the present embodiment, including the steps of:
step S202, data to be cached is obtained, and the data to be cached is cached to a preset data storage module as initial cold data.
In general, when data is acquired, a large amount of data to be cached is acquired, and the data to be cached generally includes user data generated by a user and imported into a database for storage, and the data is composed of a plurality of files, and the number of the data to be cached is about millions. In the application, the data storage module is used for storing cold data, and is generally composed of a metadata server and an object storage server in actual application, and after a large amount of data to be cached is obtained and before a user inquires for the first time, all the data to be cached are stored in the data storage module and are set to be of a cold data type.
In step S204, during the data query process, based on the detected number of data queries, current dynamic hot data is retrieved from the initial cold data, and the current dynamic hot data is migrated from the data storage module to a preset cache module.
The cache module is used for storing dynamic hot data, and in practical application, the cache module usually consists of a magnetic disk and a memory. Whether the data retrieved in the initial cold data can be set to a dynamic hot data type is determined according to the number of data queries, which can be determined by a user and is generally set to 1 or 2. After the current dynamic hot data is retrieved, the current dynamic hot data is stored in the cache module and set to be a hot data type.
Step S206, responding to the current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data
The method comprises the steps of acquiring a current query instruction input by a user, firstly querying in a current cache module, and if target user data matched with the current query instruction cannot be queried in the cache module, querying in a data storage module to obtain the target user data. After the target user data is acquired, the target user data is migrated from the data storage module to the cache module and is set to be of a hot data type.
Through steps S202 to S206, if the target user data corresponding to the current query instruction is of a cold data type, after the target user data is obtained, the target user data is stored in the cache module as new dynamic hot data. By the method, the cache module can be dynamically updated according to the data queried by the user so as to store the data in the data storage module, so that the method can better adapt to the query requirements of different users and improve the query efficiency of the users; furthermore, the large amount of data is processed in a grading way and updated in real time, so that the use amount of the memory can be reduced, the access times to the object storage service are reduced, and the cost of inquiring the user is reduced.
In one embodiment, in response to a current query instruction, performing a query in current dynamic thermal data, and if the query fails, performing the query in a data storage module to obtain target user data, and migrating the target user data from the data storage module to a cache module to obtain new dynamic thermal data, including:
responding to the current query instruction, and acquiring a query frequency result corresponding to the target user data; if the query frequency result of the detected target user data indicates that the query frequency of the target user data is greater than or equal to the preset query frequency threshold, the target user data is migrated from the data storage module to the cache module, and new dynamic thermal data is obtained.
Specifically, if the number of queries for the target user data is greater than or equal to a preset threshold number of queries, then the target user data is migrated to the cache module, and the target user data is set to be a hot data type. The number of queries threshold may generally be set to 1, 2, etc. By the method, the high-frequency data which is needed to be used by a user in a near term can be more accurately screened out based on the limit of the inquiry frequency threshold value, and the data is migrated to the cache module, so that the situation that the low-frequency data which is accidentally involved is also marked as the high-frequency data and stored in the cache module is avoided, the use amount of a memory is reduced, and the inquiry efficiency of the user is further improved; the high-frequency data is set to be the hot data type and is migrated to the cache module, so that the efficiency of inquiring the data again after a user can be improved, the cold and hot data are saved in a grading manner, and the inquiring cost of the user is reduced.
In one embodiment, the target user data includes target user data information and target metadata information, the method further comprising:
and binding and storing the target user data information and the target metadata information in the cache module, wherein the target user data information and the target metadata information are in one-to-one correspondence.
Specifically, the data to be cached comprises the target user data, the data in the data to be cached consists of metadata information and user data information, wherein the metadata information is data for describing/executing operations such as user data query in a database, the metadata referred to in the application is independent storage, the metadata is key data of the database, and once the data is damaged, the database stops service and cannot be recovered. The application carries out binding storage, namely block storage, on the data in the buffer space, wherein each block of buffer space comprises corresponding user data information and statistical information related to the user data information, so as to be used for analyzing data temperature, metadata, index information, reference count and the like required by processing the user data information. The user data information and the metadata information in each buffer space are in one-to-one correspondence. By the method, the metadata information and the user data information are bound, so that the query speed of the cache data is improved, the overall performance of the database cluster is improved, and further, the fact that cold data, namely low-frequency data, is considered to be less in query times is taken into consideration, so that the user data and the metadata in the cold data can be bound.
In one embodiment, the target user data includes a target user data address and target data, wherein the target data includes target user data information and target metadata information; after obtaining the new current dynamic hot data or responding to the current query instruction, after querying and successfully querying in the current dynamic hot data, the method further comprises the following steps:
acquiring a target user data address based on the current query instruction; and returning the target data in response to a target data query instruction obtained based on the target user data address, so as to realize the query of the target user data.
Specifically, when the target user data is acquired based on the current query instruction, the target user data can be queried in the cache module or the data storage module and then the target user data information can be returned directly, or the address corresponding to the target user data can be returned, the user further inputs the query instruction for the target data according to the returned target user data address, and then the target user data is acquired according to the target data query instruction. In the application, the user can autonomously select whether to directly return after inquiring the target user data or return the corresponding target user data address first and then return the target user data further. If the address is selected to be returned first and then the data is returned, the data can be accurately searched according to the acquired data address, so that the data query efficiency is improved, and further, the address transmission speed in the network is higher, the efficiency is higher and the network transmission load can be reduced in consideration of the fact that the size of the address information is far smaller than that of the data.
In one embodiment, the target user data address includes a target metadata address and a target data address, and responding to a target data query instruction obtained based on the target user data address, returns target data, and realizes the query on the target user data, including:
responding to a target data address inquiry instruction obtained based on the target metadata address, and returning the target data address, wherein the target metadata address and the target data address are in one-to-one correspondence;
responding to a query instruction based on the target data address, and returning the target data address;
and responding to the target data address, returning the target data, and realizing the query on the target user data.
Specifically, when the address corresponding to the target user data is fed back to the user, the address of the target metadata is generally returned first, after the user obtains the target metadata address, the user sends an address query instruction for the target data according to the address of the target metadata, then returns the target data address according to the address query instruction for the target data, and finally the user sends a query for the target data according to the target data address, so that the target data is obtained. By the method, the target data is queried according to the metadata, and the metadata and the target data are bound, so that efficient query on the target data can be realized, and further, the address is transmitted in the network instead of the data, so that the load pressure of the network can be reduced to a great extent.
In one embodiment, as shown in fig. 3, the buffer module includes a first buffer module and a second buffer module, where the current dynamic hot data is stored in the first buffer module, and the method further includes:
if the data quantity in the cache module is detected to be equal to a preset cache threshold value, calculating temperature information of dynamic hot data in the cache module based on the data query times;
if the current dynamic temperature data is retrieved from the current dynamic thermal data based on the temperature information, the current dynamic temperature data is stored in the second cache module, wherein the temperature information of the current dynamic temperature data is smaller than a preset first temperature threshold value;
and if the current dynamic cold data is retrieved from the current dynamic temperature data based on the temperature information, storing the current dynamic cold data into a data storage module, wherein the temperature information of the current dynamic cold data is smaller than a preset second temperature threshold value, and the second temperature threshold value is smaller than the first temperature threshold value.
Specifically, the first buffer module may be a memory portion, the second buffer module may be a disk portion, the dynamic hot data information in the buffer module is further processed in a hierarchical manner, data with a frequency of use smaller than that of the hot data type and larger than that of the cold data type is set as a warm data type, and stored in the second buffer module, and the data of the cold data type is stored in a portion as shown in fig. 3. When the buffer space usage amount is detected to reach a buffer threshold value, the buffer threshold value can be set by a user independently, and the swap-in and swap-out processing is started. Firstly, the data information in the cache module is queried again, the temperature information of the current cache data is calculated according to the data information, the data with the temperature information smaller than the first temperature threshold and larger than the second temperature threshold is replaced in the second cache module, and the data with the temperature information smaller than the second temperature threshold is replaced in the data storage module. The data information includes data query times, cache time and the like. It should be noted that, in the prior art, the data classification processing is generally performed on the data itself, but in the present application, the data classification is completely different from the prior art, in the present application, the classification of the data is dynamic, the data stored in the first buffer module is a hot data type, the data stored in the second buffer module is a warm data type, and the data stored in the data storage module is a cold data type, instead of directly performing the temperature classification on the data itself. Further, the swap-out operation is not performed on the data being used, and if the swap-out operation on the data cannot be completed according to the temperature information of the data, the data having the lowest temperature is swapped out. Further, when it is detected that a certain cache data is no longer needed, for example, a drop table, a truncate table, etc., the cache data is directly evicted from the cache, and the cache control information is updated. By the method, the space usage amount in the cache module can be monitored, and the data in the cache module is completely updated once, so that the cache module can timely exchange excessive data, and the continuous updating of the data storage mode according to the data usage requirement continuously changed by a user is ensured.
In one embodiment, as shown in fig. 4, fig. 4 is a schematic flow chart of data exchange and data exchange in an embodiment of the present application, where a buffer module includes buffer control information, and calculating temperature information of dynamic thermal data in the buffer module based on a number of data query times includes:
and calculating the temperature information of the dynamic data based on the initial temperature, the data query times and the preset cooling curve corresponding to all the dynamic data recorded in the cache control information, wherein the initial temperature is obtained according to the preset initial temperature setting.
Specifically, the calculation formula of the cache temperature is:
wherein alpha, beta and T are empirical coefficients, which can be adjusted autonomously by the user, T x S as described above f As an initial temperature, the data is given an initial temperature when being stored in the buffer module, and the initial temperature is calculated by multiplying the size of the buffer data by T; alpha C f Counting for reference, namely, raising the temperature of the cache every time the data is accessed;for temperature curve cooling calculation, where t i Time of initial loading of the data into the buffer module, t c For the current time, i.e. the cache temperature will slowly drop over time. The application also comprises a buffer data manager shown in figure 4, and the operations of accessing, exchanging, and the like of cold, hot and warm data can be completed through the buffer data manager. Through the calculation of the temperature information, the cache data manager obtains a temperature calculation result of the hot data and the temperature data type according to the information recorded in the cache control information, transfers the dynamic hot data with the temperature lower than a first threshold value to the temperature data type, transfers the dynamic temperature data with the temperature lower than a second threshold value to the cold data storage module, and further transfers the data accessed by the user to the hot data type without performing a swap-in and swap-out process, wherein the hot data type can be stored in a memory, and the temperature data typeMay be stored on disk. The temperature of the cache data can be judged through the formula, and the influence of the initial temperature, the access times and the preservation time on the data temperature is considered in the formula, so that the calculation of the data temperature can be better attached to the accessed condition of the data as far as possible.
In one embodiment, after obtaining the data to be cached, and caching the data to be cached as initial cold data to a preset data storage module, the method further includes:
acquiring a preset feature extraction algorithm;
searching the current dynamic hot data in the initial cold data based on a feature extraction algorithm;
and transferring the current dynamic hot data from the data storage module to the cache module to obtain new dynamic hot data.
Specifically, after a large amount of data to be cached is loaded, according to a feature extraction algorithm input by a user in advance, data which can be queried by the user at high frequency is predicted in initial cold data, so that current dynamic hot data is obtained, and the current dynamic hot data is migrated to a cache module. By the method, the user can screen the possibly used high-frequency data flexibly, and a foundation is further laid for efficient query of the subsequent user on the target data.
The embodiment also provides a specific embodiment of the data caching method, as shown in fig. 5, and fig. 5 is a schematic flow chart of the data caching method in a preferred embodiment.
The application is generally applied to a distributed database, the database is generally distributed data with separated storage and calculation of a master-slave node architecture, the master node is responsible for receiving user instructions and analysis, the slave node is a stateless computing node of an eMPP architecture, is responsible for processing user instructions, processing data and the like, and a plurality of computing nodes can simultaneously perform data access operation.
The data storage module further comprises a metadata service module and an object storage service module, the user data of the cold data type is stored in the object storage service module, the metadata of the cold data type is stored in the metadata service module, and the metadata and the user data in the data storage module are not stored in a binding mode. Specifically, the application also comprises a data manager and a data loader, wherein the data manager and the data loader jointly complete the operations of data access, data migration, swap-in, swap-out and the like in the method.
After a large amount of data to be cached is obtained for the first time, all the data to be cached are stored in metadata service and object storage service in a data storage module, at the moment, all the data are of a cold data type, then a data loader screens dynamic hot data from the data to be cached according to a feature extraction algorithm input by a user in advance, and the dynamic hot data are stored in the cache module, wherein the dynamic hot data are predicted data which can be accessed at high frequency according to the feature extraction algorithm.
In the data query stage, the data manager acquires a current query instruction of a user, queries in cache control information of a cache module according to the current query instruction to obtain a cache result, wherein the cache control information comprises an index of cache data, cache data statistical information and the like, if the cache result indicates that target user data matched with the current query instruction does not exist in the cache module, the data manager forwards the current query instruction to a data loading module, and the data loader queries in a metadata service module, and further, if the cache module does not store data, the data is directly queried in the metadata service module. The data loader acquires the target metadata information, then queries the target user data in the object storage service module, binds the target metadata information and the target user data information, stores the target metadata information and the target user data information in the first cache module, and updates the cache control information after the storage is completed.
The data manager returns the address information of the metadata in the cache module to the computing node, the computing node sends a user data query request according to the address information of the metadata, the data manager returns the address information of the user data, and the computing node sends query of the user data according to the address information of the user data, so that query of target user data is realized. Further, if the caching result indicates that the target user data matched with the current query instruction exists in the caching module, the data manager directly returns the address information of the metadata in the caching module to the computing node without passing through the data loader.
By separately storing different types of data according to temperature classification, the loads of metadata service and object storage service are effectively reduced, delay of user inquiry is reduced, and the use cost of a database is reduced; further, by returning the address first and then returning the data itself according to the corresponding address, the transmission bandwidth of metadata and user data transmitted in the network can be reduced, and the overall performance of the database cluster is improved; and the metadata and the user data are bound and stored, so that the query speed of the cache data is improved, and the cost of user query is reduced.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a data cache processing device for realizing the above related data cache processing method. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the data cache processing apparatus provided in the following may refer to the limitation of the data cache processing method, which is not described herein.
In one embodiment, as shown in fig. 6, there is provided a data cache processing apparatus, including: an acquisition module 61, a query module 62 and a generation module 63, wherein:
the obtaining module 61 is configured to obtain data to be cached, and cache the data to be cached as initial cold data to a preset data storage module;
the query module 62 is configured to retrieve current dynamic hot data from the initial cold data based on the detected number of data queries in the data query process, and migrate the current dynamic hot data from the data storage module to a preset cache module;
the generating module 63 is configured to respond to the current query instruction, perform a query in the cache module, and if the query fails, perform a query in the data storage module to obtain target user data, and migrate the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
Specifically, the obtaining module 61 stores the data to be cached in the data storage module after obtaining the data to be cached; the query module 62 retrieves current dynamic thermal data from the data storage module according to the number of data queries in the data query process, and transfers the current dynamic thermal data from the data storage module to the cache module; after the generating module 63 obtains the current query instruction, it queries in the cache module, if the query fails, it queries in the data storage module to obtain the target user data, and further migrates the target user data to the cache module to obtain new dynamic thermal data.
By means of the device, on one hand, large quantities of data to be cached are subjected to temperature grading treatment and stored in different storage positions, so that the use amount of a memory can be reduced, the maintenance cost of the device can be reduced, the access times to a data storage module can be reduced, and the query cost of a user can be reduced; on the other hand, the queried data is migrated to the cache module, so that the prefetching cache processing of the high-frequency data can be realized, the subsequent query speed of a user is improved, and the use cost of a database is reduced.
The modules in the data cache processing apparatus may be implemented in whole or in part by software, hardware, or a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, as shown in fig. 7, a data cache processing system is provided, which includes a cache device 71, a data storage device 72, and each of the modules in the data cache processing apparatus 73 described above.
Specifically, first, a large amount of data to be cached acquired by the data caching device 73 is stored in the data storage 72, and is set as a cold data type; then, the data caching device 73 acquires partial dynamic hot data from the initial cold data in the data storage device 72 according to a feature extraction algorithm preset by a user, and caches the dynamic hot data in a cache module; then, the data caching device 73 queries in the cache device 71 according to the obtained current data query instruction, if the query is successful, the target user data corresponding to the current data query instruction is directly returned, if the query is failed, the query is performed in the data storage device 72, after the target user data is obtained, the target user data is returned to the user, and the target user data is migrated to the cache module. Wherein the cache device 71, the data storage device 72 may be a server, a computer, a chip or other hardware devices for storing data, further, the data cache processing apparatus 73 may implement a data cache processing method when executed by a processor, in some embodiments, the cache device 71, the data storage device 72 may be in communication with the data cache processing apparatus 73 through a transmission device, and in other embodiments, the cache device 71, the data storage device 72 may be directly integrated on the data cache processing apparatus 73.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (10)

1. A data caching method, the method comprising:
obtaining data to be cached, and caching the data to be cached as initial cold data to a preset data storage module;
in the data query process, based on the detected data query times, retrieving current dynamic hot data from the initial cold data, and transferring the current dynamic hot data from the data storage module to a preset cache module;
and responding to the current query instruction, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
2. The method according to claim 1, wherein the responding to the current query command, querying in the cache module, if the query fails, querying in the data storage module to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data, includes:
responding to the current query instruction, and acquiring a query frequency result corresponding to the target user data;
and if the query frequency result of the target user data is detected to indicate that the query frequency of the target user data is greater than or equal to a preset query frequency threshold, migrating the target user data from the data storage module to the cache module to obtain the new dynamic thermal data.
3. The method of claim 1, wherein the target user data comprises target user data information and target metadata information, the method further comprising:
and binding and storing the target user data information and the target metadata information in the cache module, wherein the target user data information and the target metadata information are in one-to-one correspondence.
4. A method according to claim 3, wherein the target user data comprises a target user data address and target data, wherein the target data comprises the target user data information and the target metadata information; after the new current dynamic thermal data is obtained, or after the current dynamic thermal data is queried and queried successfully in response to the current query instruction, the method further comprises:
acquiring the target user data address in the cache module based on the current query instruction;
and responding to a target data query instruction obtained based on the target user data address, and returning the target data based on the target data query instruction to realize the query of the target user data.
5. The method of claim 4, wherein the target user data address comprises a target metadata address and a target data address, wherein the responding to the target data query instruction obtained based on the target user data address returns the target data based on the target data query instruction, and the querying the target user data comprises:
responding to a target data address query instruction obtained based on the target metadata address, and returning the target data address, wherein the target metadata address and the target data address are in the one-to-one correspondence;
responding to a query instruction based on the target data address, and returning the target data address;
and responding to the target data address, returning the target data, and realizing the query of the target user data.
6. The method of claim 1, wherein the cache module comprises a first cache module and a second cache module, the current dynamic thermal data stored in the first cache module, the method further comprising:
if the data quantity in the cache module is detected to be equal to a preset cache threshold value, calculating temperature information of the dynamic thermal data in the cache module based on the data query times;
if the current dynamic temperature data is retrieved from the current dynamic thermal data based on the temperature information, storing the current dynamic temperature data into the second cache module, wherein the temperature information of the current dynamic temperature data is smaller than a preset first temperature threshold;
and if the current dynamic cold data is retrieved from the current dynamic temperature data based on the temperature information, storing the current dynamic cold data into the data storage module, wherein the temperature information of the current dynamic cold data is smaller than a preset second temperature threshold value, and the second temperature threshold value is smaller than the first temperature threshold value.
7. The method of claim 6, wherein the cache module includes cache control information, and wherein the calculating the temperature information of the dynamic thermal data in the cache module based on the number of data queries includes:
and calculating the temperature information of the dynamic data based on the initial temperatures corresponding to all the dynamic data recorded in the cache control information, the data query times and a preset cooling curve, wherein the initial temperatures are obtained according to preset initial temperature settings.
8. The method of claim 1, wherein after the obtaining the data to be cached and caching the data to be cached as the initial cold data to a preset data storage module, the method further comprises:
acquiring a preset feature extraction algorithm;
searching the current dynamic hot data in the initial cold data based on the feature extraction algorithm;
and migrating the current dynamic thermal data from the data storage module to the cache module to obtain the new dynamic thermal data.
9. A data cache processing apparatus, the apparatus comprising:
the acquisition module is used for acquiring data to be cached, and caching the data to be cached as initial cold data to the preset data storage module;
the query module is used for retrieving current dynamic hot data from the initial cold data based on the detected data query times in the data query process, and transferring the current dynamic hot data from the data storage module to a preset cache module;
and the generating module is used for responding to the current query instruction, querying the cache module, querying the data storage module if the query fails to obtain target user data, and migrating the target user data from the data storage module to the cache module to obtain new dynamic thermal data.
10. A data cache processing system, characterized in that the system comprises a cache device, a data storage device, and a data cache processing apparatus according to claim 9.
CN202310736341.4A 2023-06-20 2023-06-20 Data caching processing method, device and system Pending CN116974467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310736341.4A CN116974467A (en) 2023-06-20 2023-06-20 Data caching processing method, device and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310736341.4A CN116974467A (en) 2023-06-20 2023-06-20 Data caching processing method, device and system

Publications (1)

Publication Number Publication Date
CN116974467A true CN116974467A (en) 2023-10-31

Family

ID=88472075

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310736341.4A Pending CN116974467A (en) 2023-06-20 2023-06-20 Data caching processing method, device and system

Country Status (1)

Country Link
CN (1) CN116974467A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154601A1 (en) * 2014-11-28 2016-06-02 International Business Machines Corporation Disk management in distributed storage system
CN107220185A (en) * 2017-05-23 2017-09-29 建荣半导体(深圳)有限公司 Date storage method, device and flash chip based on flash memory
WO2022218160A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Data access system and method, and device and network card
CN115221186A (en) * 2022-06-09 2022-10-21 网易(杭州)网络有限公司 Data query method, system and device and electronic equipment
CN116166691A (en) * 2023-04-21 2023-05-26 中国科学院合肥物质科学研究院 Data archiving system, method, device and equipment based on data division

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160154601A1 (en) * 2014-11-28 2016-06-02 International Business Machines Corporation Disk management in distributed storage system
CN107220185A (en) * 2017-05-23 2017-09-29 建荣半导体(深圳)有限公司 Date storage method, device and flash chip based on flash memory
WO2022218160A1 (en) * 2021-04-14 2022-10-20 华为技术有限公司 Data access system and method, and device and network card
CN115221186A (en) * 2022-06-09 2022-10-21 网易(杭州)网络有限公司 Data query method, system and device and electronic equipment
CN116166691A (en) * 2023-04-21 2023-05-26 中国科学院合肥物质科学研究院 Data archiving system, method, device and equipment based on data division

Similar Documents

Publication Publication Date Title
US20150032967A1 (en) Systems and methods for adaptive prefetching
US8825959B1 (en) Method and apparatus for using data access time prediction for improving data buffering policies
US20050262306A1 (en) Hybrid-cache having static and dynamic portions
CN108900626B (en) Data storage method, device and system in cloud environment
CN107197359B (en) Video file caching method and device
US10810054B1 (en) Capacity balancing for data storage system
US10678788B2 (en) Columnar caching in tiered storage
CN102307234A (en) Resource retrieval method based on mobile terminal
CN113806300B (en) Data storage method, system, device, equipment and storage medium
US10298709B1 (en) Performance of Hadoop distributed file system operations in a non-native operating system
CN109388351A (en) A kind of method and relevant apparatus of Distributed Storage
CN107368608A (en) The HDFS small documents buffer memory management methods of algorithm are replaced based on ARC
CN112148736A (en) Method, device and storage medium for caching data
CN105095495A (en) Distributed file system cache management method and system
JP2004280405A (en) System and method for providing information, and computer program
CN112559459B (en) Cloud computing-based self-adaptive storage layering system and method
CN110162395B (en) Memory allocation method and device
CN105574008B (en) Task scheduling method and device applied to distributed file system
CN101459599B (en) Method and system for implementing concurrent execution of cache data access and loading
CN113157609A (en) Storage system, data processing method, data processing device, electronic device, and storage medium
CN116974467A (en) Data caching processing method, device and system
US10067678B1 (en) Probabilistic eviction of partial aggregation results from constrained results storage
CN110334073A (en) A kind of metadata forecasting method, device, terminal, server and storage medium
CN109582233A (en) A kind of caching method and device of data
CN113448739B (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination