CN106156176B - Data cleaning method and device - Google Patents

Data cleaning method and device Download PDF

Info

Publication number
CN106156176B
CN106156176B CN201510184428.0A CN201510184428A CN106156176B CN 106156176 B CN106156176 B CN 106156176B CN 201510184428 A CN201510184428 A CN 201510184428A CN 106156176 B CN106156176 B CN 106156176B
Authority
CN
China
Prior art keywords
service node
data
cleaning
service
data cleaning
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510184428.0A
Other languages
Chinese (zh)
Other versions
CN106156176A (en
Inventor
孙云清
杨清华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510184428.0A priority Critical patent/CN106156176B/en
Publication of CN106156176A publication Critical patent/CN106156176A/en
Application granted granted Critical
Publication of CN106156176B publication Critical patent/CN106156176B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method of data scrubbing, comprising: the method comprises the steps of receiving a data cleaning request, determining a service node cluster where data to be cleaned are located, selecting service nodes executing a data cleaning task from the service node cluster, respectively determining different data cleaning strategies of the service nodes according to the weight values of the selected service nodes, sending a cleaning instruction to the service nodes, and instructing the service nodes to clean the data according to the determined data cleaning strategies. Meanwhile, the data cleaning device is provided, so that overload caused by the fact that a large number of instantaneous cleaning requests penetrate through a back-end service node cluster can be avoided. The cache cleaning operation of each service node cluster is performed in parallel, and each level service node simultaneously initiates data cleaning in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence during the data cleaning among the multi-level caches is avoided.

Description

Data cleaning method and device
Technical Field
The invention relates to the technical field of internet, in particular to a data cleaning method and device.
Background
With the development of computer technology and the development of data storage technology, a disk is commonly used as a medium for data caching, a large amount of cache data is generated in the running process of a system, and the normal running of the system can be ensured only by timely cleaning.
In the prior art, data is cleaned by a timing cache cleaning technology, a timestamp is recorded on the data in a cache in the mode, when a transaction accesses the cache, the timestamp is compared firstly, whether the cache is overtime is verified, if the cache is overtime, corresponding cache data is invalid, and the data is pulled from a data source again after the data needs to be cleaned.
In the prior art, whether the cache data is valid or not depends on the cache timeout time, and since data which is never updated exists in the system, frequent determination of cache data failure causes more bandwidth consumption and input/output consumption. And other data are updated frequently, and the updating of the data depends on the setting of a timeout threshold value, so that the updating of the data cannot be sensed by a user at the first time, and the consistency of the data is weak. When access data hotspots are not prominent, cache performance is greatly affected by the timeout threshold. The longer the timeout threshold is, the better the caching effect is, but the worse the data consistency is, the trade-off must be made between consistency and caching performance. The introduction of the timestamp may cause a cache data collective failure, and the cache data collective failure is reconstructed, so that a huge traffic impact is caused to a data source, and even a system avalanche may be caused. And when the multi-level cache is cleaned regularly, the cleaning of the caches in different levels is asynchronous and affects each other, so that the process of cleaning the cache is uncontrollable.
Disclosure of Invention
In view of this, the present invention provides a data cleaning method and apparatus, which clean data in a distributed cache by using a combination of depth cleaning policies, and reduce the number of data cleaning requests generated, thereby avoiding overload of a service node cluster caused by a large number of data cleaning requests penetrating through the service node cluster that caches data.
The embodiment of the invention provides a data cleaning method, which comprises the following steps:
receiving a data cleaning request, and determining a service node cluster where data pointed by the data cleaning request is located; selecting a service node performing a data cleaning task from the service node cluster; respectively determining a data cleaning strategy of each service node according to the weight value of the selected service node, wherein the data cleaning strategy comprises a deep cleaning strategy and a shallow cleaning strategy, the deep cleaning strategy is used for indicating the service node to clean data cached in a local memory and data cached in the service node cluster, and the shallow cleaning strategy is used for indicating the service node to clean data cached in the local memory; and sending a cleaning instruction to each service node to instruct each service node to clean data according to the determined data cleaning strategy.
An embodiment of the present invention provides a data cleaning apparatus, including:
the receiving module is used for receiving a data cleaning request; the first determining module is used for determining a service node cluster where the data pointed by the data cleaning request is located; the selection module is used for selecting a service node for executing a data cleaning task in the service node cluster; a second determining module, configured to respectively determine a data cleaning policy of each service node according to a weight value of the selected service node, where the data cleaning policy includes a deep cleaning policy and a shallow cleaning policy, the deep cleaning policy is used to instruct the service node to clean data cached in the local memory and data cached in the service node cluster, and the shallow cleaning policy is used to instruct the service node to clean data cached in the local memory; and the sending module is used for sending a cleaning instruction to each service node so as to instruct each service node to clean data according to the determined data cleaning strategy.
According to the data cleaning method and device in the embodiment, the cleaning center determines that the service node executes the deep cleaning strategy or the shallow cleaning strategy according to the weight value of the service node, and the cleaning of the cache in the service node cluster is completed in a manner of combining the deep cleaning strategy and the shallow cleaning strategy. And the phenomenon that a large number of instantaneous cleaning requests penetrate through the back-end service node cluster to overload the back-end service node cluster is avoided. The cache cleaning operation of each service node cluster is performed in parallel, and the service nodes in each level service cluster initiate data cleaning simultaneously in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence of data updating among multiple levels of caches is avoided.
In order to make the aforementioned and other objects, features and advantages of the invention comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
Fig. 1 is a schematic diagram of a distributed cache system according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a data cleansing system according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating a data cleaning method according to a first embodiment of the present invention;
FIG. 4 is a flowchart illustrating a data cleaning method according to a second embodiment of the present invention;
FIG. 5 is a schematic diagram illustrating an interaction flow of a data cleaning method according to a third embodiment of the present invention;
FIG. 6 is a diagram illustrating a data cleansing apparatus according to a fourth embodiment of the present invention;
FIG. 7 is a diagram illustrating a data cleansing apparatus according to a fifth embodiment of the present invention.
Detailed Description
To further illustrate the technical means and effects of the present invention adopted to achieve the predetermined objects, the following detailed description of the embodiments, structures, features and effects according to the present invention will be made with reference to the accompanying drawings and preferred embodiments.
The data cleaning method can be applied to various distributed cache systems for cleaning various data, and is particularly used for cleaning picture data. The following embodiments take picture data as an example, and the processing procedure of other types of data is the same.
Fig. 1 is a schematic diagram of a distributed cache system according to an embodiment of the present invention. Each service node shares one cache cluster TDC (tencent disk cache), which is a medium using a disk as a data cache. Each service node and the TDC form a service node cluster together.
As shown in fig. 1, service node clusters are divided into the following two categories:
one is a service node cluster (AC) closest to the user, which is composed of a plurality of service nodes directly accessed by the user Application. Service node clusters are usually distributed in different regions, the service capability and service object of each service node cluster are different, and the cached data volume and data content are also different.
The other is a larger service node cluster (OC) established over multiple ACs in close proximity to the service area, the service node providing access to the user and the AC.
In the process of downloading the picture data, when the user pulls the data through the service node in the service node cluster AC, the used application generally caches a large amount of data in the service node cluster AC. When the service node cluster AC does not have the data which is requested to be pulled by the user, the service node cluster AC returns the source to the service node cluster OC to request the data. If the service node cluster OC does not have the Data, the service node cluster OC continues to request the Data source (Data Center) for Data.
Further, after receiving the returned data, the service node cluster OC requests the data, and returns the data to the lower service node cluster AC while buffering the data in its own TDC. After receiving the data, the service node cluster AC also buffers the data in its own TDC while sending the data to the user.
On the other hand, when the user actively updates data in DC, whether it is the service node cluster AC or the service node cluster OC, the data of their TDC cache is also updated at the same time.
In the above, a complete distributed cache system is formed by the service node cluster AC, the service node cluster OC and the data source DC. It should be noted that the distributed cache system shown in fig. 1 is only an example, where the service node cluster AC does not directly have data exchange with the DC, but performs data exchange with the DC through service node cluster OC forwarding, and it should be noted that the distributed cache system may also be implemented by directly exchanging data with the DC through the service node cluster AC without through service node cluster OC forwarding. The working principle of the two is the same, and data is synchronously cached in each stage of service node and TDC thereof.
Referring to fig. 2, a data cleansing system according to an embodiment of the present invention may include:
a data update notification module 201, a cleaning center 202, a service node 203 in the AC, a TDC204 of the AC, a service node 205 in the OC, and a TDC206 of the OC.
When the user updates the data, the data updating module 201 actively sends a data updating notification to the cleaning center 202, that is, requests the cleaning center 202 to clean the dirty data in the cache.
The cleaning center 202 determines an AC service node 203, an AC TDC204, an OC service node 205, and an OC TDC206 to which the update data belongs, determines a service node for executing a cleaning task from the AC service node 203 or the OC service node 205, and further determines a data cleaning policy of the service node according to the determined weight value of the service node for executing the cleaning task, that is, determines service nodes for respectively executing a shallow cleaning policy and a deep cleaning policy. The shallow cleaning strategy is used for instructing the service node to clean the data cached in the local memory, and the deep cleaning strategy is used for instructing the service node to clean the data cached in the local memory and the data cached in the service node cluster to which the service node belongs.
The cleaning center 202 sends data cleaning instructions to the service node 203 of the AC and/or the service node 205 of the OC, which execute the shallow cleaning policy and the deep cleaning policy, respectively, instructing them to clean the data according to the determined cleaning policy.
The service node 203 of the AC and the service node 205 of the OC respectively clean the cache data of the service node, the TDC204 of the AC and the TDC206 of the OC according to the cleaning instruction.
After the data cleaning is completed, the TDC204 of the AC and the TDC206 of the OC feed back the data cleaning result to the service node 203 of the AC and the service node 205 of the OC, and the service node 203 of the AC and the service node 205 of the OC feed back the data cleaning result and the data cleaning result thereof to the cleaning center 202. Further, the cleaning center 202 sends the data cleaning result fed back by each service node to the data update notification module 201.
Referring to fig. 3, a data cleaning method according to a first embodiment of the present invention includes:
step 301, receiving a data cleaning request, and determining a service node cluster where data pointed by the data cleaning request is located;
the cleaning center 202 receives a data cleaning request, which may be a request for cleaning the cached data initiated by the data updating module 201 to the cleaning center 202 actively after the user updates the data, or a data cleaning request from a user or other devices. And determining the service node cluster where the data to be cleaned by the data cleaning request is located.
Step 302, selecting a service node for executing a data cleaning task from a service node cluster;
the cleaning center 202 selects a service node for executing a task of cleaning data in the determined service node cluster according to the condition of the data to be cleaned and the number of the required service nodes.
Step 303, respectively determining a data cleaning strategy of each service node according to the weight value of the selected service node, wherein the data cleaning strategy comprises a deep cleaning strategy and a shallow cleaning strategy;
the weighted value of the service node represents the capability of the service node for executing the data cleaning task, the higher the weighted value is, the better the communication condition of the service node is, and the success rate of completing the cleaning task is higher. Therefore, the service node with the higher weight value is selected as the service node which is finally determined to execute the data cleaning task, and the successful completion of the data cleaning task is facilitated. The cleaning center 202 receives the heartbeat information of the service node and the execution condition information of the data cleaning task, and dynamically adjusts the weight value of the service node according to the heartbeat information and the execution condition information.
The deep cleaning strategy is used for indicating the service node to clean the data cached in the local memory and cleaning the data cached in the service node cluster to which the service node belongs.
The shallow cleaning strategy is used for indicating the service node to clean only the data cached in the local memory.
And step 304, sending a cleaning instruction to each service node, and instructing each service node to clean data according to the determined data cleaning strategy.
The cleaning center 202 sends a cleaning instruction to each service node, instructs each service node to perform data cleaning according to the determined data cleaning strategy, the service node executing the deep cleaning strategy is responsible for cleaning the self cache data and the data cached by the self service node cluster, and the service node executing the shallow cleaning strategy is responsible for cleaning the self cache data.
In this embodiment, the cleaning center determines, according to the weight value of the service node, that the service node executes a deep cleaning policy or a shallow cleaning policy, and completes cleaning of the cache in the service node cluster in a manner of combining the deep and shallow cleaning policies. And the phenomenon that a large number of instantaneous cleaning requests penetrate through the back-end service node cluster to overload the back-end service node cluster is avoided. The cache cleaning operation of each service node cluster is performed in parallel, and the service nodes in each level service cluster initiate data cleaning simultaneously in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence of data updating among multiple levels of caches is avoided.
Referring to fig. 4, a data cleaning method according to a second embodiment of the present invention includes:
step 401, receiving a data cleaning request, and determining a service node cluster where data pointed by the data cleaning request is located;
the cleaning center 202 receives a data cleaning request, which may be a request for cleaning the cached data initiated by the data updating module 201 to the cleaning center 202 actively after the user updates the data, or a data cleaning request from a user or other devices. And determining the service node cluster where the data to be cleaned by the data cleaning request is located.
Step 402, determining the number of service nodes for selecting to execute a data cleaning task according to a preset proportion and the number of service nodes in a service node cluster;
the cleaning center 202 selects a service node that performs a task of cleaning data among the determined service node clusters. Specifically, according to the number of service nodes in a service node cluster, a certain proportion of service nodes are determined to serve as service nodes to be selected for executing a data cleaning task, the amount of data to be cleaned is large, and more service nodes are correspondingly selected for data cleaning.
Step 403, obtaining the weight values of the service nodes, and sequentially arranging the service nodes according to the size sequence of the weight values;
further, the service node executing the data cleaning task is finally determined by combining the weight values of the service nodes to be selected, the weight value of the service node represents the capability of the service node executing the data cleaning task, the higher the weight value is, the better the communication condition of the service node is, and the higher the success rate of completing the cleaning task is. Therefore, the service node with the higher weight value is selected as the service node which is finally determined to execute the data cleaning task, and the successful completion of the data cleaning task is facilitated.
And acquiring the weight value of each service node, and sequentially arranging each service node according to the size sequence of the weight values.
Step 404, determining service nodes which execute the data cleaning task and meet the number from the arranged service nodes;
step 405, arranging the selected service nodes according to the magnitude sequence of the weight values, determining that the service nodes with the number equal to the number execute a deep cleaning strategy, and executing a shallow cleaning strategy by the rest selected service nodes;
the deep cleaning strategy is used for indicating the service node to clean the data cached in the local memory and cleaning the data cached in the service node cluster to which the service node belongs.
The shallow cleaning strategy is used for indicating the service node to clean only the data cached in the local memory.
For example, 20 service nodes which perform a data cleaning task are determined, the 20 service nodes are arranged according to the magnitude sequence of the weight values, if 6 service nodes are required to execute a deep cleaning policy, the service node of the first 6 bits is determined from the 20 sorted service nodes to execute the deep cleaning policy, and the remaining 14 service nodes execute a shallow cleaning policy.
Further, the cleaning center 202 receives heartbeat information of the service node and execution condition information of the data cleaning task, and dynamically adjusts the weight value of the service node, specifically, each time feedback information of a heartbeat or a cleaning task that is successfully executed is received, an increment value is added to the current weight value, for example, a heartbeat information from the node a is received, the current weight value is increased by 1, a feedback information of a data cleaning task that is successfully executed is received from the node B is received, the current weight value is increased by 1, a heartbeat information from the node C is received, and the feedback information of a data cleaning task that is successfully executed is received, the current weight value is increased by 2.
The higher the weight value is, the better the communication condition of the service node is, and the higher the success rate of completing the data cleaning task is. Then the service node with higher weight value is selected as the service node executing the deep cleaning policy.
Step 406, sending a cleaning instruction to each service node, and instructing each service node to clean data according to the determined data cleaning strategy;
the cleaning center 202 sends a cleaning instruction to each service node, instructs each service node to perform data cleaning according to the determined data cleaning strategy, the service node executing the deep cleaning strategy is responsible for cleaning the self cache data and the data cached by the self service node cluster, and the service node executing the shallow cleaning strategy is responsible for cleaning the self cache data.
Step 407, receiving data cleaning result information returned by the service node executing the data cleaning task.
After the service node completes the data cleaning task, the service node replies the data cleaning result to the cleaning center 202.
In this embodiment, the cleaning center determines, according to the weight value of the service node, that the service node executes a deep cleaning policy or a shallow cleaning policy, and completes cleaning of the cache in the service node cluster in a manner of combining the deep and shallow cleaning policies. And the phenomenon that a large number of instantaneous cleaning requests penetrate through the back-end service node cluster to overload the back-end service node cluster is avoided. The cache cleaning operation of each service node cluster is performed in parallel, and the service nodes in each level service cluster initiate data cleaning simultaneously in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence of data updating among multiple levels of caches is avoided.
Referring to fig. 5, fig. 5 is a schematic diagram illustrating an interaction flow between devices in a data cleansing system according to an embodiment of the present invention, and the system includes, according to the data cleansing system in fig. 2: a data update notification module 201, a cleaning center 202, a service node 203 in the AC, a TDC204 of the AC, a service node 205 in the OC, and a TDC206 of the OC.
Step 501, data updating notification;
when the data update notification module 201 acquires information of data update, it sends a data update notification to the cleaning center 202.
Step 502, determining a cleaning strategy;
the cleaning center 202 determines an AC service node 203, an AC TDC204, an OC service node 205, and an OC TDC206 to which the update data belongs, determines a service node that executes a cleaning task in the AC service node 203 or the OC service node 205, and determines a data cleaning policy of the service node according to the weight values of the AC service node 203 and the OC service node 205, that is, determines the service nodes that execute the respective shallow cleaning policy and the deep cleaning policy.
The shallow cleaning strategy is used for indicating the service node to clean the data cached in the local memory, and the deep cleaning strategy is used for indicating the service node to clean the data cached in the local memory and the data cached in the service node cluster to which the service node belongs.
Step 503, sending a cleaning instruction;
the cleaning center 202 sends data cleaning instructions to the service nodes 203 of the AC and 205 of the OC which execute the shallow cleaning strategy and the deep cleaning strategy respectively, and instructs the service nodes to clean the data according to the determined cleaning strategy.
Step 504, cleaning data;
the service node 203 of the AC and the service node 205 of the OC respectively clean the cache data of the service node, the TDC204 of the AC and the TDC206 of the OC according to the cleaning instruction.
Step 505, feeding back a result;
the TDC204 of the AC and the TDC206 of the OC feed back the data cleaning result to the service node 203 of the AC and the service node 205 of the OC, and the service node 203 of the AC and the service node 205 of the OC feed back the data cleaning result and the data cleaning result thereof to the cleaning center 202.
Step 506, data cleaning result
The cleaning center 202 sends the data cleaning result fed back by each service node to the data update notification module 201.
Referring to fig. 6, an embodiment of the present invention further provides a data cleansing apparatus, where the apparatus includes:
a receiving module 601, configured to receive a data cleaning request;
a first determining module 602, configured to determine a service node cluster where data pointed by the data cleaning request is located;
a selecting module 603, configured to select a service node performing a data cleaning task from the service node cluster determined by the determining module 602;
the second determining module 604 is further configured to determine a data cleaning policy of each service node according to the weight value of the selected service node;
the data cleaning strategy comprises a deep cleaning strategy and a shallow cleaning strategy, wherein the deep cleaning strategy is used for indicating the service node to clean the data cached in the local memory and the data cached in the service node cluster to which the service node belongs; the shallow cleaning strategy is used for instructing the service node to clean the data cached in the local memory;
a sending module 605, configured to send a cleaning instruction to each service node, so as to instruct each service node to perform data cleaning according to the determined data cleaning policy.
For other details of the technical solution for implementing each module in the apparatus of this embodiment, reference may be made to the description in the embodiment shown in fig. 3, which is not described herein again.
In this embodiment, the cleaning center determines, according to the weight value of the service node, that the service node executes a deep cleaning policy or a shallow cleaning policy, and completes cleaning of the cache in the service node cluster in a manner of combining the deep and shallow cleaning policies. And the phenomenon that a large number of instantaneous cleaning requests penetrate through the back-end service node cluster to overload the back-end service node cluster is avoided. The cache cleaning operation of each service node cluster is performed in parallel, and the service nodes in each level service cluster initiate data cleaning simultaneously in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence of data updating among multiple levels of caches is avoided.
Referring to fig. 7, an embodiment of the present invention further provides a data cleansing apparatus, where the apparatus includes:
a receiving module 701, configured to receive a data cleaning request;
a first determining module 702, configured to determine a service node cluster where the data pointed by the data cleaning request is located;
a selecting module 703, configured to select a service node performing a data cleaning task from the service node cluster determined by the determining module 702;
the second determining module 704 is further configured to determine the data cleaning policy of each service node according to the weight value of the selected service node.
The data cleaning strategy comprises a deep cleaning strategy and a shallow cleaning strategy, wherein the deep cleaning strategy is used for indicating the service node to clean the data cached in the local memory and the data cached in the service node cluster to which the service node belongs; the shallow cleaning strategy is used for instructing the service node to clean the data cached in the local memory;
a sending module 705, configured to send a cleaning instruction to each service node, so as to instruct each service node to perform data cleaning according to the determined data cleaning policy.
Further, the selecting module 703 includes:
a first determining submodule 7031, configured to determine, according to the number of service nodes in the service node cluster, the number of service nodes that select a data cleaning task according to a preset ratio;
an obtaining module 7032, configured to obtain a weight value of each service node;
the first determining sub-module 7031 is further configured to sequentially arrange the service nodes according to the magnitude order of the weight value, and determine, from the arranged service nodes, service nodes that perform a data cleaning task according to the number.
Further, the second determining module 704 further includes:
an arranging module 7041, configured to arrange the selected service nodes according to the magnitude order of the weight values;
a second determining submodule 7042, configured to determine, from the service nodes arranged by the arranging module 7041, service nodes with a number of numerical bits equal to the number before performing the deep cleaning policy, and determine remaining selected service nodes to perform the shallow cleaning policy;
the receiving module 701 is further configured to receive heartbeat information of each service node and execute feedback information of the data cleaning task.
The apparatus in this embodiment may further include:
the weight adjusting module 706 is configured to add an incremental value to the weight value of the service node every time the heartbeat information or the successful feedback information of the data cleaning task is received.
The receiving module 701 is further configured to receive data cleaning result information returned by the service node executing the data cleaning task.
For other details of the technical solution for implementing each module in the apparatus of this embodiment, reference may be made to the description of the embodiment shown in fig. 1 and fig. 6, which is not repeated herein.
In this embodiment, the cleaning center determines, according to the weight value of the service node, that the service node executes a deep cleaning policy or a shallow cleaning policy, and completes cleaning of the cache in the service node cluster in a manner of combining the deep and shallow cleaning policies. And the phenomenon that a large number of instantaneous cleaning requests penetrate through the back-end service node cluster to overload the back-end service node cluster is avoided. The cache cleaning operation of each service node cluster is performed in parallel, and the service nodes in each level service cluster initiate data cleaning simultaneously in a short time, so that the data cleaning of the whole distributed cache system is completed, and the mutual influence of data updating among multiple levels of caches is avoided.
In addition, an embodiment of the present invention further provides a computer-readable storage medium, in which computer-executable instructions are stored, where the computer-readable storage medium is, for example, a non-volatile memory such as an optical disc, a hard disc, or a flash memory. The computer-executable instructions are used for making a computer or a similar operation device perform various operations in the data processing method.
Although the present invention has been described with reference to the preferred embodiments, it is to be understood that the invention is not limited to the disclosed embodiments, but is intended to cover various modifications, equivalents and alternatives falling within the spirit and scope of the invention.

Claims (9)

1. A method of data scrubbing, comprising:
receiving a data cleaning request, and determining a service node cluster in which data pointed by the data cleaning request is located, wherein the service node cluster is composed of a plurality of service nodes and a cache cluster, and the plurality of service nodes in the service node cluster share the cache cluster in the service node cluster;
selecting a service node performing a data cleaning task from the service node cluster;
respectively determining a data cleaning strategy of each service node according to the weight value of the selected service node, wherein the data cleaning strategy comprises a deep cleaning strategy and a shallow cleaning strategy, the deep cleaning strategy is used for indicating the service node to clean data cached in a local memory and data cached in the service node cluster, and the shallow cleaning strategy is used for indicating the service node to clean data cached in the local memory;
sending a cleaning instruction to each service node to instruct each service node to clean data according to the determined data cleaning strategy;
receiving heartbeat information and data cleaning task execution feedback information of each service node;
and adding an increment value for the weight value of the service node every time the heartbeat information or the feedback information of successful execution of the data cleaning task is received.
2. The method of claim 1, wherein selecting the service node in the service node cluster to perform the data cleaning task comprises:
determining the number of service nodes for selecting to execute a data cleaning task according to the number of the service nodes in the service node cluster and a preset proportion;
acquiring the weight value of each service node, and sequentially arranging each service node according to the size sequence of the weight values;
and determining the service nodes which meet the number of the execution data cleaning tasks from the arranged service nodes.
3. The method of claim 2, wherein the determining the data cleaning policy of each service node according to the weight value of the selected service node comprises:
and arranging the selected service nodes according to the magnitude sequence of the weight values, determining that the service nodes with the number equal to the number execute a deep cleaning strategy, and executing a shallow cleaning strategy by the rest selected service nodes.
4. The method according to any of claims 1-3, wherein said sending a cleaning instruction to each of said serving nodes is followed by:
and receiving data cleaning result information returned by the service node executing the data cleaning task.
5. A data cleansing apparatus, comprising:
the receiving module is used for receiving the data cleaning request and receiving heartbeat information of each service node and data cleaning task execution feedback information;
a first determining module, configured to determine a service node cluster where data pointed by the data cleaning request is located, where the service node cluster is formed by a plurality of service nodes and a cache cluster, and the plurality of service nodes in the service node cluster share the cache cluster in the service node cluster;
the selection module is used for selecting a service node for executing a data cleaning task in the service node cluster;
a second determining module, configured to respectively determine a data cleaning policy of each service node according to a weight value of the selected service node, where the data cleaning policy includes a deep cleaning policy and a shallow cleaning policy, the deep cleaning policy is used to instruct the service node to clean data cached in the local memory and data cached in the service node cluster, and the shallow cleaning policy is used to instruct the service node to clean data cached in the local memory;
a sending module, configured to send a cleaning instruction to each service node, so as to instruct each service node to perform data cleaning according to the determined data cleaning policy;
and the weight adjusting module is used for adding an increment value to the weight value of the service node every time the heartbeat information or the successful feedback information of the data cleaning task is received.
6. The apparatus of claim 5, wherein the selection module comprises:
the first determining submodule is used for determining the number of the service nodes for selecting the data cleaning task according to the number of the service nodes in the service node cluster and a preset proportion;
the acquisition module is used for acquiring the weight value of each service node;
the first determining submodule is further configured to sequentially arrange the service nodes according to the magnitude order of the weight values, and determine, from the arranged service nodes, service nodes that perform a data cleaning task in accordance with the number.
7. The apparatus of claim 5 or 6, wherein the second determining module further comprises:
the arrangement module is used for arranging the selected service nodes according to the magnitude sequence of the weight values;
and the second determining submodule is used for determining the service nodes with the number equal to the number of the service nodes before executing the deep cleaning strategy and determining the rest selected service nodes to execute the shallow cleaning strategy from the service nodes arranged by the arranging module.
8. The apparatus of claim 5, wherein the receiving module is further configured to receive data cleaning result information returned by a service node executing a data cleaning task.
9. A computer-readable storage medium, in which one or more programs are stored, the programs being used for executing the data cleansing method according to any one of claims 1 to 4.
CN201510184428.0A 2015-04-17 2015-04-17 Data cleaning method and device Active CN106156176B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510184428.0A CN106156176B (en) 2015-04-17 2015-04-17 Data cleaning method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510184428.0A CN106156176B (en) 2015-04-17 2015-04-17 Data cleaning method and device

Publications (2)

Publication Number Publication Date
CN106156176A CN106156176A (en) 2016-11-23
CN106156176B true CN106156176B (en) 2020-04-24

Family

ID=58057863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510184428.0A Active CN106156176B (en) 2015-04-17 2015-04-17 Data cleaning method and device

Country Status (1)

Country Link
CN (1) CN106156176B (en)

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101562543B (en) * 2009-05-25 2013-07-31 阿里巴巴集团控股有限公司 Cache data processing method and processing system and device thereof
CN102724314B (en) * 2012-06-20 2015-08-05 深圳市远行科技有限公司 A kind of distributed caching client based on metadata management
CN102932472B (en) * 2012-11-13 2017-01-25 深圳市路友网络科技有限公司 Data processing method and communication system
CN103945005B (en) * 2014-05-06 2017-06-13 山西睿信智达传媒科技股份有限公司 Dynamic load leveling framework based on many evaluation indexes
CN104317658B (en) * 2014-10-17 2018-06-12 华中科技大学 A kind of loaded self-adaptive method for scheduling task based on MapReduce

Also Published As

Publication number Publication date
CN106156176A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN110764912B (en) Self-adaptive task scheduler and method
US8973005B2 (en) Information processing apparatus, information processing method, recording medium and information processing system
US8391295B2 (en) Temporal affinity-based routing of workloads
CN105207947B (en) A kind of gradual traffic scheduling method and system of filtering shake
CN102577241B (en) Method, device and system for scheduling distributed buffer resources
US20220014434A1 (en) Slice Resource Deployment Method and Apparatus, and Slice Manager and Computer Storage Medium
US11281511B2 (en) Predictive microservice systems and methods
US20100312970A1 (en) Cache Management Through Delayed Writeback
CN105095495B (en) A kind of distributed file system buffer memory management method and system
CN113300881A (en) 5G network-based scheduling method, device, equipment and storage medium
EP3582116A1 (en) Webpage loading method, webpage loading system, and server
CN111338806A (en) Service control method and device
CN108874324A (en) A kind of access request processing method, device, equipment and readable storage medium storing program for executing
CN108415962A (en) A kind of cloud storage system
CN106293953B (en) A kind of method and system of the shared display data of access
US9898421B2 (en) Memory access processing method, memory chip, and system based on memory chip interconnection
CN109815204A (en) A kind of metadata request distribution method and equipment based on congestion aware
CN106156176B (en) Data cleaning method and device
US11003508B2 (en) Apparatus and methods for load balancing across a network of nodes
CN111917823A (en) Data reconstruction method and device based on distributed storage Ceph
US9933966B2 (en) Storage control device and computer system
US9727457B2 (en) Minimizing latency due to garbage collection in a distributed system
DE102022200313A1 (en) CLIENT CONTROLLED CLOUD NETWORK ACCESS SYSTEM AND PROCEDURE
US9270530B1 (en) Managing imaging of multiple computing devices
KR101793963B1 (en) Remote Memory Data Management Method and System for Data Processing Based on Mass Memory

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant