CN110109954B - Data processing method, system, electronic device and storage medium - Google Patents

Data processing method, system, electronic device and storage medium Download PDF

Info

Publication number
CN110109954B
CN110109954B CN201810058451.9A CN201810058451A CN110109954B CN 110109954 B CN110109954 B CN 110109954B CN 201810058451 A CN201810058451 A CN 201810058451A CN 110109954 B CN110109954 B CN 110109954B
Authority
CN
China
Prior art keywords
data
key value
cache service
thread
hot
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810058451.9A
Other languages
Chinese (zh)
Other versions
CN110109954A (en
Inventor
刘玉虎
朱建勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201810058451.9A priority Critical patent/CN110109954B/en
Publication of CN110109954A publication Critical patent/CN110109954A/en
Application granted granted Critical
Publication of CN110109954B publication Critical patent/CN110109954B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24552Database cache management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a data processing method, a data processing system, electronic equipment and a storage medium, and belongs to the technical field of computers. The method comprises the following steps: the method comprises the steps that a writing process receives a writing request, wherein the writing request comprises first data and a key value; if the second data corresponding to the key value is stored in the bottom storage service, the writing process updates the second data stored in the bottom storage service by using the first data, and queries the key value from the cache service; if the hot loading thread is loading the second data to the cache service according to the query result and the key value exists in the flow center, the writing process stores the flow of the first data corresponding to the key value in the flow center; after the hot loading thread successfully loads the second data to the cache service, the hot loading thread acquires running water from the running water center according to the key value, obtains first data according to the second data and the running water in the cache service, and deletes the key value and the running water in the running water center; the cache service stores the first data. The invention can ensure the consistency of the data in the bottom storage service and the cache service.

Description

Data processing method, system, electronic device and storage medium
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a data processing method, a system, an electronic device, and a storage medium.
Background
Internet traffic involves a large number of read-write tasks, which requires faster read-write speeds for data. However, memories such as magnetic disks, SSDs (Solid State Drives ) are now commonly employed as the underlying storage. Although the capacity of the memory is large, it takes a long time to read data from the memory. If the thermal data is read from the memory every time, the response speed is slow, and the thermal data is data with the reading frequency larger than a preset threshold value in unit time.
In the related art, data processing systems typically include a write process, a cache service, and an underlying storage service. Wherein the caching service and the underlying storage service are two different implementation logics, and the speed of reading data from the caching service is higher than the speed of reading data from the underlying storage service. After the client determines that the first data in the underlying storage service is hot data, loading the first data into the cache service; the writing process receives a writing request, wherein the writing request comprises second data and a key value; if the first data corresponding to the key value is stored in the bottom storage service, the writing process updates the first data stored in the bottom storage service by using the second data; after the synchronization time is reached, the client will replace the first data in the cache service with the second data in the underlying storage service.
Since the hot data in the underlying storage service is periodically synchronized to the cache service, if the synchronization time span is long, the hot data read by the client is not the latest data.
Disclosure of Invention
The embodiment of the invention provides a data processing method, a system, electronic equipment and a storage medium, which are used for solving the problem that hot data read from a cache service is not the latest data when the hot data in a bottom storage service is synchronized to the cache service at regular intervals. The technical scheme is as follows:
in one aspect, a data processing method is provided, where the data processing system includes a write thread, a hot load thread, a cache service, a bottom storage service, and a pipeline center, where the cache service is used to store hot data; the method comprises the following steps:
the write process receives a write request, wherein the write request comprises first data and a key value;
if the second data corresponding to the key value is stored in the bottom storage service, the writing process updates the second data stored in the bottom storage service by using the first data, and queries the key value from the cache service;
if the hot loading thread is loading the second data to the cache service according to the query result and the key value exists in the pipelining center, the writing process stores the pipelining of the first data corresponding to the key value in the pipelining center, and the pipelining center stores the key value of the hot data;
After the hot loading thread successfully loads the second data to the cache service, the hot loading thread acquires the running water from the running water center according to the key value, the first data is acquired in the cache service according to the second data and the running water, and the key value and the running water are deleted in the running water center;
the caching service stores the first data.
In one aspect, a data processing system is provided, the data processing system including a write thread, a hot load thread, a cache service, an underlying storage service, and a hub, the cache service to store hot data;
the writing process is used for receiving a writing request, and the writing request comprises first data and a key value;
the writing process is further configured to update, when second data corresponding to the key value is stored in the bottom storage service, the second data stored in the bottom storage service by using the first data, and query the cache service for the key value;
the writing process is further used for storing a running water of the first data corresponding to the key value in the running water center when the hot loading thread is loading the second data to the cache service according to the query result and the key value exists in the running water center, and the running water center stores the key value of the hot data;
The hot loading thread is further configured to obtain the running water from the running water center according to the key value after the hot loading thread successfully loads the second data to the cache service, obtain the first data according to the second data and the running water in the cache service, and delete the key value and the running water in the running water center;
the cache service is configured to store the first data.
In one aspect, an electronic device is provided, the electronic device comprising a data processing system as described above.
In one aspect, a computer readable storage medium having stored therein at least one instruction, at least one program, code set, or instruction set loaded and executed by the processor to implement a data processing method as described above is provided.
The technical scheme provided by the embodiment of the invention has the beneficial effects that:
after the writing process updates the second data stored in the bottom layer storage service by using the first data, the first data is required to be written into the cache service, if the writing process writes the first data into the cache service, the hot loading thread is loading the second data into the cache service, the writing process can store the pipelining of the first data in the pipelining center, so that after the hot loading thread successfully loads the second data into the cache service, the hot loading thread can obtain the first data in the cache service according to the second data and the pipelining, the first data is also stored in the cache service, and the problem that the hot data read from the cache service is not the latest data when the hot data in the bottom layer storage service is regularly synchronized into the cache service is solved, and the consistency of the data in the bottom layer storage service and the cache service is ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a data processing system shown in accordance with some demonstrative embodiments;
FIG. 2 is a schematic diagram illustrating the transition of four states according to one embodiment of the present invention;
FIG. 3 is a logical schematic of a read thread shown in accordance with one embodiment of the present invention;
FIG. 4 is a logical schematic of a write thread shown in accordance with one embodiment of the present invention;
FIG. 5 is a logical schematic of a hot-load thread shown in accordance with one embodiment of the present invention;
FIG. 6 is a schematic diagram of a cache system according to an embodiment of the present invention;
FIG. 7 is a method flow diagram of a data processing method provided by one embodiment of the present invention;
fig. 8-10 are flow charts of a data processing method according to another embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Referring to FIG. 1, a schematic diagram of a data processing system according to an embodiment of the present invention is shown. The data processing system includes a read thread (Reader), a write thread (Writer), a hot load thread (Loader), a Cache service (Cache), an underlying storage service (disk/ssd), and a streaming center (Binlog-center).
The embodiment of the invention is to build a cache service on the basis of memories such as a disk and an SSD, so that the bottom storage service and the cache service are the same realization logic and can update the data in the bottom storage service and the cache service at the same time.
FIG. 1 illustrates the connection between a read thread, a write thread, a hot load thread, a cache service, an underlying storage service, and a hub, and the logic of each module is described below:
the states of the key values mentioned below are first described as an initial state, a loading completion state, and an available state. The initial state is the state in which the data is stored in the underlying storage service. The loading state is a state in which data in the underlying storage service is loaded into the cache service after the data is determined to be hot data. The load complete state is a state in which the data load is successful. The available state is a state in which data in the cache service is available. Referring to fig. 2, a schematic diagram of transitions between four states is shown.
1. Reading threads:
step1, receiving a read request sent by a client, inquiring a key value carried in the read request from a cache service, entering step5 when the cache service caches data corresponding to the key value and the key value is in an available state, otherwise judging whether data heating is needed according to a state returned by the cache service, entering step2 if the data heating is needed, and otherwise entering step4;
step2, inserting the key value into the flow center, and entering step3;
step3, generating a heat rising task, and waking up the heat loading thread to execute the heat rising task;
step4, inquiring the key value to the bottom storage service, and returning the data returned by the bottom storage service to the client;
step5, returning the data returned by the caching service to the client.
Referring to FIG. 3, a logical schematic of a read process is shown.
2. Write thread:
step1, receiving a write request sent by a client, writing a key value and data into a bottom storage service, entering step2 if the write is successful, and entering step5 if the write is not successful;
step2, inquiring the key value from the cache service, entering step5 when the state of the key value is an initial state, otherwise entering step3, and writing the key value and data into the cache service when the state of the key value is an available state;
step3, judging whether the key value exists in the flow center when the state of the key value is a loading state or a loading completion state, if so, storing the flow in the flow center in a first-in first-out mode, entering step5, otherwise, entering step4;
step4, storing the key value and data in the cache service (mainly to prevent the hot load thread from just deleting the key value from the hub when the write process accesses the hub);
step5, returning the successful writing result to the client.
Referring to FIG. 4, a logical schematic of a write process is shown.
It should be noted that, in the related art, the read process and the write process have a coupling relationship, that is, when data in the bottom storage service is loaded into the cache service, the data is locked in the bottom storage service, so that the write process needs to wait until the data is loaded, and then write updated data into the bottom storage service, thereby resulting in poor performance of the system. In the embodiment of the invention, the reading process and the writing process are decoupled. This is because reading data and writing data can be performed simultaneously, i.e., when loading data in the underlying storage service to the cache service, the writing process can save a stream of data in the streaming center in addition to writing updated data to the underlying storage service, so as to update the data in the cache service according to the stream.
3. Hot-load thread:
step1, acquiring a heat-raising task from a task list after being awakened, and acquiring data from a bottom storage service according to the heat-raising task;
step2, caching the data to a cache service, and modifying the state of the key value into a loading completion state;
step3, acquiring a lock corresponding to the key value from a running water center, and acquiring running water of data;
step4, redoing the stream in the cache service, and setting the key value to be in an available state;
step5, deleting the key value and the running water from the running center, and releasing the lock.
Referring to FIG. 5, a logical schematic of a hot-load process is shown.
4. Caching service:
when a read request is received for the first time, a key value and the access times in the read request are recorded, the state of the key value is set as an initial state, and when the access times of the data corresponding to the key value in the unit time recorded subsequently are equal to a preset threshold value, the state of the key value is modified into a loading state so as to indicate the read process to heat the data.
Referring to fig. 6, a schematic diagram of a cache system is shown, where a head is an array of hash operations, a item (data item) is a key pair, and after hash operation is performed on a key, the key may correspond to a head, and the head points to the item containing the key. level is the LRU (Least recently used ) queue, and in embodiments of the present invention, items containing a key may be partitioned into the LRU queue based on the number of accesses to the key. Wherein partitioning an item into an LRU queue refers to establishing a doubly linked list between the item and the LRU queue. The present embodiment may support different data types, such as kv (key pair) structure, hash (hash) structure, zset (classified-Sets) structure, list (list) structure, and the like, and the present embodiment is not limited thereto. Wherein, the different data types are different in item structure.
5. The underlying storage service:
the underlying storage service is used to store key values and data correspondingly.
6. And (3) a water center:
the pipeline center is used to store pipeline of data in a first-in first-out manner.
The system can decouple the read thread and the write thread, and can access the built-in cache service only by making a small amount of change to the original system. Meanwhile, the hot loading thread exists independently, and the hot loading thread can be informed of data needing to be heated in a command mode. The data in the cache service is visible to the user only when the key value is in the available state, so that the consistency of the data in the cache service and the data in the bottom storage service is ensured.
In the related art, since the underlying storage service and the cache service are different implementation logic, only the hot data in the underlying storage service can be synchronized into the cache service periodically. If the synchronization time span is large, e.g., one minute of synchronization, it is possible that the hot data has been changed in the underlying cache service within this minute, however, whether the cache service returns hot data to the read process or before the change, can cause a problem that the hot data read from the cache service is not up-to-date. In this embodiment, according to the above description, the underlying storage service and the cache service are the same implementation logic, so that the writing process can write data into the cache service after successfully writing data into the underlying storage service, so as to ensure the consistency of the data.
Referring now to FIG. 7, a flowchart illustrating a method of data processing according to one embodiment of the present invention is shown for use in the data processing system of FIG. 1. The data processing method comprises the following steps:
in step 701, a write process receives a write request, the write request including first data and a key value.
The write request may be sent by the client to the write process.
In step 702, if the second data corresponding to the key value is stored in the bottom storage service, the writing process updates the second data stored in the bottom storage service by using the first data, and queries the cache service for the key value.
The bottom layer storage service stores a plurality of groups of key value pairs, each group of key value pairs comprises a key value and data, and the key value is equivalent to the index of the data.
In this embodiment, the key value in step 702 is the same as the key value in step 701, and the first data may be regarded as an update to the second data.
When the data needs to be updated, the first data can be used for replacing the second data in the key value pair, so that a key value and the first data group of key value pairs are obtained.
The cache service is used for storing hot data, and the hot data can be statistically determined for the access times or set at the operation side, and the method for determining the hot data will be described in detail hereinafter, which is not described here again. If the hot data is set manually, the hot data can also be called as pre-load data to meet the requirement of the operation side.
When a write process queries the cache service for a key, the following may occur:
1. the key value is not stored in the cache service, at this time, since the key value is not stored in the cache service, the second data is not stored, the subsequent reading process reads the first data from the bottom storage service, the writing process does not need to write the first data into the cache service, and a successful writing result is directly returned to the client;
2. the key value is cached in the cache service, and the state of the key value is an initial state, at this time, since the second data cannot be stored in the cache service when the key value is the initial state, the subsequent reading process reads the first data from the bottom storage service, the writing process does not need to write the first data into the cache service, and a successful writing result is directly returned to the client;
3. the cache service stores the key value, and the state of the key value is a loading state or a loading completion state, and step 703 is executed;
4. the key value is stored in the cache service, and the state of the key value is in an available state, at this time, since the subsequent reading process can directly read the second data from the cache service when the state of the key value is in the available state, the writing process needs to update the second data stored in the cache service by using the first data, so as to ensure that the first data in the cache service is the same as the first data in the bottom layer storage, and a successful writing result is returned to the client.
In step 703, if it is determined, according to the query result, that the hot load thread is loading the second data to the cache service and that the key exists in the pipeline center, the write process stores the pipeline of the first data in the pipeline center corresponding to the key, and the pipeline center stores the key of the hot data.
The key is inserted into the pipeline center by the read process when it is determined that the second data is hot data.
Wherein, the key value may not exist in the flow center, and the flow at this time is directly stored corresponding to the key value; the key value and the running water may exist in the running water center, and the running water at this time is stored at the last according to the generation sequence of the running water; the key is not present in the pipeline center, and this is described in step 812 and is not described here.
It should be noted that, during the process of loading the second data into the cache service by the hot loading thread, the writing process may write the data corresponding to the key value into the cache service multiple times, each writing process needs to store a pipeline corresponding to the key value in the pipeline center, and the pipeline written later will not replace the pipeline written earlier. For example, in the process that the hot loading thread loads the second data to the cache service, the writing process writes the data corresponding to the key value into the cache service three times, and then three pieces of running water corresponding to the key value in the running water center are stored.
Step 704, after the hot-loading thread loads the second data to the cache service, the hot-loading thread obtains the running water from the running water center according to the key value, obtains the first data according to the second data and the running water in the cache service, and deletes the key value and the running water in the running water center.
A pipeline may be an instruction to write data.
Obtaining the first data from the second data and the pipeline means: the second data is replaced with the first data in the pipeline. For example, the content of the stream is a key value 1-an identifier 2, the first data in the cache service is an identifier 1, and the identifier 1 corresponds to the key value 1, and then the identifier 1 corresponding to the key value 1 in the cache service needs to be replaced by the identifier 2.
After the hot loading thread finishes processing all running water, the key value and the running water are deleted in the running center.
In step 705, the caching service stores first data.
In summary, in the data processing method provided by the embodiment of the present invention, after the writing process updates the second data stored in the bottom storage service by using the first data, the first data needs to be written into the cache service, if the writing process writes the first data into the cache service, and the hot loading thread is loading the second data into the cache service, the writing process may store the first data in the pipeline center, so after the hot loading thread loads the second data into the cache service successfully, the hot loading thread may obtain the first data in the cache service according to the second data and the pipeline, so that the first data is also stored in the cache service, and the problem that when the hot data in the bottom storage service is regularly synchronized into the cache service, the hot data read from the cache service is not the latest data is solved, thereby ensuring the consistency of the data in the bottom storage service and the cache service.
Referring to fig. 8-10, a flowchart of a data processing method according to another embodiment of the present invention is shown, which is used in the data processing system shown in fig. 1. The data processing method comprises the following steps:
for ease of understanding, the following description of the steps is presented below. Referring to fig. 8, steps 801-803 describe a scenario in which the second data is written. Referring to fig. 9, steps 804-812 describe a scenario in which the read thread obtains that the second data is hot data when the read thread reads the second data, and the write process writes the first data when the read process performs data heating on the second data, where the first data is an update on the second data. Referring to fig. 10, steps 813-816 describe a scenario in which the writing process writes third data when the key corresponding to the first data in the cache service is in the available state, and the third data is an update to the first data.
In step 801, a write process receives a write request, the write request including a key value and second data.
The write request may be sent by the client to the write process.
In step 802, if the key value is not stored in the underlying storage service, the writing process writes the key value and the second data in the underlying storage service, and queries the cache service for the key value.
The bottom layer storage service stores a plurality of groups of key value pairs, each group of key value pairs comprises a key value and data, and the key value is equivalent to the index of the data.
When the underlying storage service stores the key value and the data, replacing the data in the key value pair with the second data to obtain a set of key value pairs of the key value and the second data. When the underlying storage service does not store the key value, the key value and the second data are correspondingly stored.
Steps 801 and 802 describe a write operation prior to determining the second data as hot data, where querying the caching service for a key has only two cases: 1. the key value is not stored in the cache service; 2. the key value is cached in the cache service, and the state of the key value is an initial state.
In step 803, when it is determined that the cache service does not store the second data according to the query result, the writing process does not write the key value and the second data to the cache service.
When the writing process inquires the key value from the cache service, the cache service can return an inquiry result of the initial state to the writing process, and the writing process can determine that the cache service does not store the second data according to the inquiry result. At this time, it may be that the key is not stored in the cache service, which corresponds to case 1; it is also possible that the key is cached in the cache service and the state of the key is the initial state, which corresponds to case 2.
In the initial state, the second data is not hot data, so the caching service does not cache the second data. If the read thread reads the second data from the cache service at this time, the cache service instructs the read thread to read the second data from the bottom storage service, so the second data finally read by the read thread is returned from the bottom storage service, and the second data does not need to be written into the cache service at this time.
In step 804, the read thread receives the read request and queries the cache service for a key value included in the read request.
The read request may be sent by the client to the read thread.
When querying the cache service for key values, the following may occur: 1. the key value is not stored in the cache service; 2. caching the key value in the cache service, wherein the state of the key value is an initial state or a loading completion state; 3. the key value is stored in the cache service, and the state of the key value is a loading state; 4. the key value is stored in the cache service and the state of the key value is an available state.
If the situation 1 is that the cache service records the key value, the access times and the access time, and sets the state of the key value as an initial state, at this time, the cache service can return the initial state and feedback information of the second data which is not stored to the read thread, so that the read process can determine that the data is not heated according to the initial state, and read and return the second data from the bottom storage service according to the feedback information.
If the state is the case 2, the loading completion state in the case 2 is that the hot loading thread has successfully loaded the second data to the cache service, and the initial state in the case 2 is that the cache service does not store the second data corresponding to the key value and the second data is not hot data. At this time, the cache service returns a loading completion state or an initial state to the read process; and when the read thread receives the loading completion state or the initial state, the read thread reads the second data from the underlying storage service and returns the second data to the client. In order to quickly respond to the read request, the cache service may also return feedback information that the second data is not stored to the read process, so that the read process may determine that the data is not heated according to the loading completion state and the initial state, and read the second data from the underlying storage service and return the second data to the client according to the feedback information.
If it is the case 3, the load status in the case 3 is that the cache service does not store the second data corresponding to the key value and the second data is hot data, and step 805 is executed.
If the data is the case 4, the available state in the case 4 is that the cache service stores the second data, and at this time, the cache service returns the second data to the reading process, and the reading process returns the second data to the client.
In step 805, when it is determined according to the query result that the cache service does not store the second data corresponding to the key value and the second data is hot data, the read thread reads the second data from the underlying storage service and returns the second data to the client, inserts the key value into the hub, generates a heating task, and wakes the hot loading thread to execute the heating task.
When the cache service determines that the number of times of inquiring the key value in the read thread meets the preset condition, the cache service determines that the second data is hot data; or, the caching service determines that the second data is hot data when receiving setting information for indicating that the second data is set as hot data, that is, hot data set by the operation side. At this time, the cache service modifies the state of the key value from the initial state to the load state, and returns the load state to the read process. In order to quickly respond to the read request, the cache service may also return feedback information that the second data is not stored to the read process, so that the read process may perform data heating according to the loading state, and read the second data from the underlying storage service and return the second data to the client according to the feedback information.
Wherein the preset condition may be equal to or smaller than or larger than a threshold, and when the number of accesses is equal to or smaller than or larger than the threshold, it is determined that the number of accesses satisfies the preset condition.
In addition to identifying thermal data by the number of accesses per unit time and the access time, thermal data may also be identified by other dimensions, and the present embodiment is not limited.
After the heat-up task is generated, the read process can also add the heat-up task to the task list, so that after the heat-up loading thread is awakened, the heat-up loading thread can sequentially execute the heat-up task in the task list.
It should be noted that, if the hot data is set on the operation side, the hot data may also be referred to as preloaded data, so as to meet the requirement on the operation side.
At step 806, the hot load thread loads the second data as hot data to the cache service under the direction of the hot-up task.
In step 807, the write process receives a write request, the write request containing first data and a key value.
The write request may be sent by the client to the write process.
In step 808, if the second data corresponding to the key value is stored in the bottom storage service, the writing process updates the second data stored in the bottom storage service by using the first data, and queries the cache service for the key value.
The key values corresponding to the first data and the second data are the same, so the first data can be regarded as an update to the second data.
When the data needs to be updated, the first data can be used for replacing the second data in the key value pair, so that a key value and the first data group of key value pairs are obtained.
Steps 807 and 808 describe a write operation after the second data is determined to be hot data, where the write process queries the cache service for a key, as may be the case: the key value is stored in the cache service, and the state of the key value is a loading state or a loading completion state.
If it is determined, according to the query result, that the hot load thread is loading the second data to the cache service and that the key exists in the pipeline center, then the write process stores the pipeline of the first data in the pipeline center corresponding to the key, and the pipeline center stores the key of the hot data.
The key is inserted into the pipeline center by the read process when it is determined that the second data is hot data.
Wherein, the key value may not exist in the flow center, and the flow at this time is directly stored corresponding to the key value; the key value and the running water may exist in the running water center, and the running water at this time is stored at the last according to the generation sequence of the running water; the key is not present in the pipeline center, and this is described in step 812 and is not described here.
Step 810, after the hot loading thread successfully loads the second data to the cache service, the hot loading thread obtains the running water from the running water center according to the key value, obtains the first data according to the second data and the running water in the cache service, and deletes the key value and the running water in the running water center.
The hot loading thread obtains running water from a running water center according to the key value, obtains first data according to second data and the running water in the cache service, deletes the key value and the running water in the running water center, and comprises the following steps: the hot loading thread obtains a corresponding lock and running water from the running water center according to the key value, wherein the lock is used for locking the key value; the hot loading thread obtains first data according to the second data and the streaming water in the cache service; the hot load thread deletes the key and the pipeline in the pipeline center and releases the lock.
After the key is locked by the lock, the write process cannot write the running water of the data corresponding to the key to the running water center, and only waits for the hot load thread to delete the key and the running water in the running water center, and after releasing the lock, the write process executes step 812 again.
In step 811, the caching service stores the first data.
After the data heating is carried out through the scheme, aiming at the scene that a small amount of hot spot data exists in a large storage capacity, a small amount of memory can be used, so that the hot data can be identified efficiently, the performance of the system is improved, and the storage efficiency is also improved.
After the second data is loaded into the cache service, the key value pairs of each group stored in the cache service can be divided into different least recently used LRU queues according to the access times of the key values, and each key value pair comprises the key value and the data; and if the storage capacity of the cache service reaches a preset threshold, deleting the key value pair from the LRU queue with the lowest access times.
That is, the key value pairs are classified into different LRU queues according to the number of accesses of the key value, and when the capacity of the cache service reaches the upper limit, the key value pairs are deleted from the LRU queue with low access number until the need is satisfied, and the key value pairs are stopped being deleted.
Optionally, each LRU queue is scanned at intervals of a predetermined time, and key value pairs in each LRU queue that are not accessed beyond a preset duration are deleted, so as to ensure that all hot data stored in the cache service are stored.
Therefore, by designing a plurality of LRU queues, counting two indexes of access times and access time, identifying hot data according to the access frequency, and eliminating the hot data according to the access time, the problem that the real hot data is eliminated due to insufficient capacity of cache service is avoided.
In step 812, if it is determined that the hot-load thread has successfully loaded the second data into the cache service according to the query result and the key value does not exist in the pipeline center, the writing process writes the first data in the cache service corresponding to the key value.
If it is determined according to the query result that the hot-loading thread has successfully loaded the second data to the cache service and the key value does not exist in the pipeline center, it may be that the hot-loading thread deletes the key value in the pipeline center after processing the running water of the last data, and in this case, in order to ensure the consistency of the data, the writing process may write the first data corresponding to the key value in the cache service.
In step 813, the write process receives a write request containing the key value and the third data.
The write request may be sent by the client to the write process.
In step 814, if the first data corresponding to the key value is stored in the bottom storage service, the writing process updates the first data stored in the bottom storage service by using the third data, and queries the cache service for the key value.
The key values corresponding to the first data and the third data are the same, so the third data can be regarded as an update to the first data.
When the data needs to be updated, the third data can be used for replacing the first data in the key value pair in the underlying storage service, so that a set of key value pairs of the key value and the third data is obtained.
Steps 813 and 814 describe a write operation after caching the second data to the cache service with the state of the key being available, where the write process queries the cache service for the key, as may be the case: the key value is stored in the cache service and the state of the key value is an available state.
In step 815, when it is determined that the first data is stored in the cache service according to the query result, the writing process updates the first data stored in the cache service using the third data.
When the data needs to be updated, the first data in the key value pair in the cache service can be replaced by the third data, so that a set of key value pairs of the key value and the third data is obtained.
In summary, in the data processing method provided by the embodiment of the present invention, after the writing process updates the second data stored in the bottom storage service by using the first data, the first data needs to be written into the cache service, if the writing process writes the first data into the cache service, and the hot loading thread is loading the second data into the cache service, the writing process may store the first data in the pipeline center, so after the hot loading thread loads the second data into the cache service successfully, the hot loading thread may obtain the first data in the cache service according to the second data and the pipeline, so that the first data is also stored in the cache service, and the problem that when the hot data in the bottom storage service is regularly synchronized into the cache service, the hot data read from the cache service is not the latest data is solved, thereby ensuring the consistency of the data in the bottom storage service and the cache service.
If it is determined according to the query result that the hot loading thread has successfully loaded the second data to the cache service and the running center does not have a key value, it may be that the hot loading thread deletes the key value in the running center after processing the running water of the last data, and at this time, the writing process may write the first data corresponding to the key value in the cache service to ensure the consistency of the data.
Through the built-in cache service, from the user side, the user can write the data into the bottom storage service and the cache service only by triggering one writing process, and in the related art, the user needs to trigger two writing processes to write the data into the bottom storage service and the cache service respectively, so that the complexity of writing operation can be simplified.
The operation side can set the data which is likely to be accessed suddenly in the future as hot data and load the hot data into the cache service through the hot loading thread, so that the requirement of the operation side can be met.
By designing a plurality of LRU queues, counting two indexes of access times and access time, identifying hot data according to access frequency, and eliminating the hot data according to access time, the problem that real hot data is eliminated due to insufficient capacity of cache service is avoided.
With reference now to FIG. 1, a block diagram illustrating a data processing system is depicted in accordance with one embodiment of the present invention. The data processing system includes:
the writing process is used for receiving a writing request, and the writing request comprises first data and a key value;
the writing process is also used for updating the second data stored in the bottom storage service by using the first data when the second data corresponding to the key value is stored in the bottom storage service, and inquiring the key value from the cache service;
the writing process is further used for storing the running water of the first data corresponding to the key value in the running water center when the hot loading thread is loading the second data to the cache service according to the query result and the key value exists in the running water center, and the key value of the hot data is stored in the running water center;
the hot loading thread is further used for acquiring running water from the running water center according to the key value after the hot loading thread successfully loads the second data to the cache service, acquiring the first data according to the second data and the running water in the cache service, and deleting the key value and the running water in the running water center;
and the cache service is used for storing the first data.
Optionally, the writing process is further configured to write the first data corresponding to the key value in the cache service when it is determined that the hot load thread has successfully loaded the second data to the cache service according to the query result and the key value does not exist in the pipeline center.
Optionally, the data processing system further comprises a read thread;
a read thread for receiving the read request and inquiring the key value contained in the read request from the cache service;
the read thread is used for reading and returning the second data from the bottom storage service to the client when the second data corresponding to the key value is not stored in the cache service and the second data is hot data according to the query result, inserting the key value into the running center, generating a heating task and waking up the hot loading thread to execute the heating task;
the hot loading thread is further used for loading the second data to the cache service under the instruction of the hot lifting task.
Optionally, when the number of times of querying the key value in the unit time counted by the cache service meets a preset condition, determining that the second data is hot data; or, when the caching service acquires setting information for indicating that the second data is set as hot data, it is determined that the second data is hot data.
Optionally, the read thread is further configured to read and return the second data from the bottom storage service when it is determined, according to the query result, that the hot loading thread has successfully loaded the second data to the cache service, or it is determined, according to the query result, that the cache service does not store the second data corresponding to the key value and the second data is not hot data; or when the second data is stored in the cache service according to the query result, the second data is read from the cache service and returned to the client.
Optionally, the writing process is further configured to receive a writing request before the writing process receives the writing request, where the writing request includes the first data and the key value, and the writing request includes the key value and the second data;
the writing process is further used for writing the key value and the second data in the bottom storage service and inquiring the key value from the cache service when the key value is not stored in the bottom storage service;
and the writing process is also used for not writing the key value and the second data into the cache service when the cache service is determined to not store the second data according to the query result.
Optionally, the write process is further configured to receive a write request after the cache service stores the first data, where the write request includes a key value and third data;
the writing process is further used for updating the first data stored in the bottom storage service by using the third data when the first data corresponding to the key value is stored in the bottom storage service, and inquiring the key value from the cache service;
the cache service is also used for returning fourth indication information to the writing process when the first data is determined to be cached;
and the writing process is also used for updating the first data stored by the cache service by using the third data when the first data is stored by the cache service according to the query result.
Optionally, the cache service is further configured to divide each set of key value pairs stored in the cache service into different LRU queues according to the number of accesses of the key values, where each set of key value pairs includes a key value and data;
And the cache service is also used for deleting the key value pair from the LRU queue with the lowest access times when the storage capacity of the cache service reaches a preset threshold value.
Optionally, the cache service is further configured to scan each LRU queue at intervals of a predetermined time, and delete key value pairs in each LRU queue that are not accessed beyond a preset duration.
Optionally, the hot loading thread is further configured to obtain a corresponding lock and running water from the running water center according to the key value, where the lock is used to lock the key value;
the hot loading thread is also used for obtaining first data according to the second data and the streaming water in the cache service;
the hot load thread is also used to delete key values and pipeline in the pipeline center and release the lock.
In summary, in the data processing system provided by the embodiment of the present invention, after the writing process updates the second data stored in the bottom storage service by using the first data, the first data needs to be written into the cache service, if the writing process writes the first data into the cache service, and the hot loading thread is loading the second data into the cache service, the writing process may store the first data in the pipeline center, so after the hot loading thread loads the second data into the cache service successfully, the hot loading thread may obtain the first data in the cache service according to the second data and the pipeline, so that the first data is also stored in the cache service, and the problem that when the hot data in the bottom storage service is regularly synchronized into the cache service, the hot data read from the cache service is not the latest data is solved, thereby ensuring the consistency of the data in the bottom storage service and the cache service.
If the loading is successful and the key value does not exist in the pipeline center during the query, the hot loading thread may delete the key value in the pipeline center after the last piece of data is processed, and at this time, the writing process may write the first data corresponding to the key value in the cache service to ensure the consistency of the data.
Through the built-in cache system, from the user side, the user can write the data into the bottom storage service and the cache service only by triggering one writing process, and in the related art, the user needs to trigger two writing processes to write the data into the bottom storage service and the cache service respectively, so that the complexity of writing operation can be simplified.
The operation side can set the data which is likely to be accessed suddenly in the future as hot data and load the hot data into the cache service through the hot loading thread, so that the requirement of the operation side can be met.
By designing a plurality of LRU queues, counting two indexes of access times and access time, identifying hot data according to access frequency, and eliminating the hot data according to access time, the problem that real hot data is eliminated due to insufficient capacity of cache service is avoided.
An embodiment of the invention also provides an electronic device comprising a data processing system as shown in fig. 1 for performing the data processing method as described in fig. 8-10.
An embodiment of the present invention also provides a computer readable storage medium having stored therein at least one instruction, at least one program, a set of codes, or a set of instructions, the at least one instruction, the at least one program, the set of codes, or the set of instructions being loaded and executed by the processor to implement the data processing method as described in fig. 8-10.
It should be noted that: in the data processing system provided in the above embodiment, only the division of the above functional modules is used for illustration, and in practical application, the above functional allocation may be performed by different functional modules according to needs, that is, the internal structure of the data processing system is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the data processing system and the data processing method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the data processing system and the data processing method are detailed in the method embodiments and are not repeated herein.
The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims (13)

1. The data processing method is characterized by being used in a data processing system, wherein the data processing system comprises a write thread, a hot loading thread, a cache service, an underlying storage service and a streaming center, and the cache service is used for storing hot data; the method comprises the following steps:
the write thread receives a write request, wherein the write request comprises first data and a key value;
if the second data corresponding to the key value is stored in the bottom storage service, the writing thread updates the second data stored in the bottom storage service by using the first data, and inquires the key value from the cache service;
if the hot loading thread is loading the second data to the cache service according to the query result and the key value exists in the pipelining center, the writing thread stores the pipelining of the first data corresponding to the key value in the pipelining center, and the pipelining center stores the key value of the hot data;
After the hot loading thread successfully loads the second data to the cache service, the hot loading thread acquires the running water from the running water center according to the key value, the first data is acquired in the cache service according to the second data and the running water, and the key value and the running water are deleted in the running water center;
the caching service stores the first data.
2. The method according to claim 1, wherein the method further comprises:
if the hot loading thread is determined to successfully load the second data to the cache service according to the query result and the key value does not exist in the pipeline center, the writing thread writes the first data corresponding to the key value in the cache service.
3. The method of claim 1, wherein the data processing system further comprises a read thread; the method further comprises the steps of:
the read thread receives a read request and inquires the buffer service of a key value contained in the read request;
when the cache service does not store the second data corresponding to the key value according to the query result and the second data is hot data, the read thread reads the second data from the bottom storage service and returns the second data to the client, the key value is inserted into the pipeline center, a hot-lifting task is generated, and the hot-loading thread is awakened to execute the hot-lifting task;
The hot loading thread loads the second data to the cache service under the instruction of the hot lifting task.
4. A method according to claim 3, characterized in that the method further comprises:
when the number of times of inquiring the key value in the unit time counted by the cache service meets a preset condition, determining that the second data is hot data; or alternatively, the process may be performed,
when the cache service acquires setting information for indicating that the second data is set as hot data, it is determined that the second data is hot data.
5. A method according to claim 3, characterized in that the method further comprises:
when the hot loading thread is determined to successfully load the second data to the cache service according to the query result, or the cache service is determined to not store the second data corresponding to the key value and the second data is not hot data according to the query result, the reading thread reads the second data from the bottom storage service and returns the second data to the client; or alternatively, the process may be performed,
and when the second data is stored in the cache service according to the query result, the read thread reads the second data from the cache service and returns the second data to the client.
6. The method of claim 1, wherein prior to the write thread receiving a write request, the write request containing first data and a key, the method further comprises:
the write thread receives a write request, the write request including a key value and second data;
if the key value is not stored in the bottom storage service, the writing thread writes the key value and the second data in the bottom storage service and inquires the key value from the cache service;
and when the cache service does not store the second data according to the query result, the write thread does not write the key value and the second data into the cache service.
7. The method of claim 1, wherein after the caching service stores the first data, the method further comprises:
the write thread receives a write request, the write request including a key value and third data;
if the first data corresponding to the key value is stored in the bottom storage service, the writing thread updates the first data stored in the bottom storage service by using the third data, and inquires the key value from the cache service;
And when the first data is stored in the cache service according to the query result, the write thread updates the first data stored in the cache service by using the third data.
8. The method according to claim 1, wherein the method further comprises:
the cache service divides each group of key value pairs stored in the cache service into different least recently used LRU queues according to the access times of the key values, wherein each group of key value pairs comprises the key values and data;
and if the storage capacity of the cache service reaches a preset threshold, deleting the key value pair from the LRU queue with the lowest access times by the cache service.
9. The method of claim 8, wherein the method further comprises:
and the cache service scans each LRU queue at intervals of preset time, and deletes key value pairs which are not accessed and exceed preset time in each LRU queue.
10. The method of any of claims 1 to 9, wherein the hot load thread obtaining the pipeline from the pipeline center based on the key, obtaining the first data from the second data and the pipeline in the cache service, and deleting the key and the pipeline at the pipeline center, comprising:
The hot loading thread obtains a corresponding lock and the running water from the running water center according to the key value, wherein the lock is used for locking the key value;
the hot loading thread obtains the first data according to the second data and the streaming water in the cache service;
the hot load thread deletes the key and the pipeline in the pipeline center and releases the lock.
11. A data processing system, wherein the data processing system comprises a write thread, a hot load thread, a cache service, an underlying storage service, and a hub, the cache service for storing hot data;
the write thread is used for receiving a write request, and the write request comprises first data and a key value;
the write thread is further configured to update, when second data corresponding to the key value is stored in the bottom storage service, the second data stored in the bottom storage service by using the first data, and query the cache service for the key value;
the write thread is further configured to store, in the pipelined center, a pipelined of the first data corresponding to the key value when it is determined, according to a query result, that the hot load thread is loading the second data to the cache service and the pipelined center has the key value, the pipelined center storing the key value of the hot data;
The hot loading thread is further configured to obtain the running water from the running water center according to the key value after the hot loading thread successfully loads the second data to the cache service, obtain the first data according to the second data and the running water in the cache service, and delete the key value and the running water in the running water center;
the cache service is configured to store the first data.
12. An electronic device for performing the data processing method according to any one of claims 1 to 10.
13. A computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement the data processing method of any of claims 1 to 10.
CN201810058451.9A 2018-01-22 2018-01-22 Data processing method, system, electronic device and storage medium Active CN110109954B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810058451.9A CN110109954B (en) 2018-01-22 2018-01-22 Data processing method, system, electronic device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810058451.9A CN110109954B (en) 2018-01-22 2018-01-22 Data processing method, system, electronic device and storage medium

Publications (2)

Publication Number Publication Date
CN110109954A CN110109954A (en) 2019-08-09
CN110109954B true CN110109954B (en) 2023-05-26

Family

ID=67483438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810058451.9A Active CN110109954B (en) 2018-01-22 2018-01-22 Data processing method, system, electronic device and storage medium

Country Status (1)

Country Link
CN (1) CN110109954B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113010513A (en) * 2021-03-01 2021-06-22 中国工商银行股份有限公司 Hot loading method and device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101090401A (en) * 2007-05-25 2007-12-19 金蝶软件(中国)有限公司 Data buffer store method and system at duster environment
CN102541757A (en) * 2011-11-30 2012-07-04 华为技术有限公司 Write cache method, cache synchronization method and device
CN102591864A (en) * 2011-01-06 2012-07-18 上海银晨智能识别科技有限公司 Data updating method and device in comparison system
CN104283956A (en) * 2014-09-30 2015-01-14 腾讯科技(深圳)有限公司 Strong consistence distributed data storage method, device and system
CN104391862A (en) * 2014-10-23 2015-03-04 北京锐安科技有限公司 Method and device for updating cache data
CN106959969A (en) * 2016-01-12 2017-07-18 恒生电子股份有限公司 A kind of data processing method and device
CN107544916A (en) * 2016-06-29 2018-01-05 迈普通信技术股份有限公司 A kind of caching method and storage device

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007524923A (en) * 2003-05-23 2007-08-30 ワシントン ユニヴァーシティー Intelligent data storage and processing using FPGA devices
US9251064B2 (en) * 2014-01-08 2016-02-02 Netapp, Inc. NVRAM caching and logging in a storage system
US9152330B2 (en) * 2014-01-09 2015-10-06 Netapp, Inc. NVRAM data organization using self-describing entities for predictable recovery after power-loss

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101090401A (en) * 2007-05-25 2007-12-19 金蝶软件(中国)有限公司 Data buffer store method and system at duster environment
CN102591864A (en) * 2011-01-06 2012-07-18 上海银晨智能识别科技有限公司 Data updating method and device in comparison system
CN102541757A (en) * 2011-11-30 2012-07-04 华为技术有限公司 Write cache method, cache synchronization method and device
CN104283956A (en) * 2014-09-30 2015-01-14 腾讯科技(深圳)有限公司 Strong consistence distributed data storage method, device and system
CN104391862A (en) * 2014-10-23 2015-03-04 北京锐安科技有限公司 Method and device for updating cache data
CN106959969A (en) * 2016-01-12 2017-07-18 恒生电子股份有限公司 A kind of data processing method and device
CN107544916A (en) * 2016-06-29 2018-01-05 迈普通信技术股份有限公司 A kind of caching method and storage device

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Biplob Debnath 等.FlashStore: high throughput persistent key-value store.《Proceedings of the VLDB Endowment》.2010,第3卷(第1-2期),1414-1425. *
S. Byan 等.Mercury: Host-side flash caching for the data center.《2012 IEEE 28th Symposium on Mass Storage Systems and Technologies (MSST)》.2012,1-12. *
李青.基于NoSQL的大数据处理的研究.《中国优秀硕士学位论文全文数据库 信息科技辑》.2014,(第11(2014)期),I138-248. *
苏振华.基于分布式缓存的高性能电子商务商品后台系统的设计与实现.《中国优秀硕士学位论文全文数据库 信息科技辑》.2015,(第12(2015)期),I138-307. *

Also Published As

Publication number Publication date
CN110109954A (en) 2019-08-09

Similar Documents

Publication Publication Date Title
CN108319654B (en) Computing system, cold and hot data separation method and device, and computer readable storage medium
CN108009008A (en) Data processing method and system, electronic equipment
CN108804031A (en) Best titime is searched
CN101673192B (en) Method for time-sequence data processing, device and system therefor
US20140181119A1 (en) Method and system for accessing files on a storage system
CN111007991B (en) Method for separating read-write requests based on NVDIMM and computer thereof
Yue et al. Building an efficient put-intensive key-value store with skip-tree
CN103246696A (en) High-concurrency database access method and method applied to multi-server system
CN104503703B (en) The treating method and apparatus of caching
CN108021717B (en) Method for implementing lightweight embedded file system
CN107329704B (en) Cache mirroring method and controller
CN113377868A (en) Offline storage system based on distributed KV database
CN111552442A (en) SSD-based cache management system and method
CN1996312A (en) Method and data processing system for managing a mass storage system
CN107888687B (en) Proxy client storage acceleration method and system based on distributed storage system
JP4189342B2 (en) Storage apparatus, storage controller, and write-back cache control method
US10078467B2 (en) Storage device, computer readable recording medium, and storage device control method
CN110109954B (en) Data processing method, system, electronic device and storage medium
CN111475099A (en) Data storage method, device and equipment
CN105892942B (en) Mix operating method, controller and the electronic equipment of storage system
CN115878625A (en) Data processing method and device and electronic equipment
CN109582233A (en) A kind of caching method and device of data
KR101153688B1 (en) Nand flash memory system and method for providing invalidation chance to data pages
CN113835613B (en) File reading method and device, electronic equipment and storage medium
CN110209343B (en) Data storage method, device, server and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant