CN112487009A - Data updating method, device, equipment, storage medium and program product - Google Patents

Data updating method, device, equipment, storage medium and program product Download PDF

Info

Publication number
CN112487009A
CN112487009A CN202011474674.7A CN202011474674A CN112487009A CN 112487009 A CN112487009 A CN 112487009A CN 202011474674 A CN202011474674 A CN 202011474674A CN 112487009 A CN112487009 A CN 112487009A
Authority
CN
China
Prior art keywords
data
bloom filter
period
hash value
updating
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011474674.7A
Other languages
Chinese (zh)
Inventor
李海波
何兰州
肖勤
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ByteDance Network Technology Co Ltd
Original Assignee
Beijing ByteDance Network Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ByteDance Network Technology Co Ltd filed Critical Beijing ByteDance Network Technology Co Ltd
Priority to CN202011474674.7A priority Critical patent/CN112487009A/en
Publication of CN112487009A publication Critical patent/CN112487009A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2365Ensuring data consistency and integrity
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosed embodiments provide a data updating method, apparatus, device, storage medium and program product, the method comprising: determining a data ID corresponding to the updated at least one data; determining a hash value corresponding to each data ID according to a preset rule; and configuring a bloom filter according to the hash value corresponding to each data ID, so that a second computing node in the distributed system determines the data needing to be updated according to the bloom filter. The data updating method, the data updating device, the data updating equipment, the data updating storage medium and the program product can realize the transmission of the updating information by constructing the bloom filter by using the data ID, do not need to update data in a one-by-one comparison mode, improve the data updating efficiency and reduce the communication cost.

Description

Data updating method, device, equipment, storage medium and program product
Technical Field
The disclosed embodiments relate to the field of computer technologies, and in particular, to a data updating method, apparatus, device, storage medium, and program product.
Background
In a distributed system, multiple compute nodes may use an external storage device in common to store data. For hot spot data which is frequently accessed, each computing node can cache a copy of the data in a local memory, so that the phenomenon that a large amount of network resources are consumed and delay is increased due to repeated data reading from an external storage device during computing is avoided. After a certain computing node updates the original data in the external storage device, in order to ensure data consistency, the data copies of other computing nodes also need to be updated correspondingly.
At present, when data is updated, the computing nodes in the distributed system need to compare the data one by one, which results in low updating efficiency.
Disclosure of Invention
The embodiment of the disclosure provides a data updating method, a data updating device, a data updating apparatus, a storage medium and a program product, so as to solve the technical problem of low data updating efficiency in a distributed system.
In a first aspect, an embodiment of the present disclosure provides a data updating method applied to a first computing node in a distributed system, where the method includes:
determining a data ID corresponding to the updated at least one data;
determining a hash value corresponding to each data ID according to a preset rule;
and configuring a bloom filter according to the hash value corresponding to each data ID, so that a second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
In a second aspect, an embodiment of the present disclosure provides a data update apparatus applied to a first computing node in a distributed system, where the apparatus includes:
the determining module is used for determining a data ID corresponding to the updated at least one data;
the calculation module is used for determining hash values corresponding to the data IDs according to preset rules;
and the configuration module is used for configuring the bloom filter according to the hash value corresponding to each data ID, so that the second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
In a third aspect, an embodiment of the present disclosure provides an electronic device, including: a memory and at least one processor;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the data update method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, the embodiments of the present disclosure provide a computer-readable storage medium, in which computer-executable instructions are stored, and when a processor executes the computer-executable instructions, the data updating method according to the first aspect and various possible designs of the first aspect is implemented.
In a fifth aspect, embodiments of the present disclosure provide a computer program product comprising a computer program that, when executed by a processor, implements a data update method as described above in the first aspect and various possible designs of the first aspect.
According to the data updating method, the data updating device, the data updating equipment, the data updating storage medium and the program product, the data ID corresponding to at least one piece of updated data can be determined through the first computing node in the distributed system, the hash value corresponding to each data ID is determined according to the preset rule, and the bloom filter is configured according to the hash value corresponding to each data ID, so that the second computing node in the distributed system can determine the data needing to be updated according to the bloom filter, corresponding data cached in the second computing node is updated, the bloom filter can be constructed by using the data ID to realize transmission of updating information, the data do not need to be updated in a one-by-one comparison mode, the data updating efficiency is improved, and the communication cost is reduced.
Drawings
In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present disclosure, and other drawings can be obtained according to these drawings by those skilled in the art without inventive exercise.
Fig. 1 is a schematic view of an application scenario provided in the embodiment of the present disclosure;
fig. 2 is a schematic flow chart of a data updating method provided by an embodiment of the present disclosure;
FIG. 3 is a schematic diagram of a bloom filter provided by an embodiment of the present disclosure;
FIG. 4 is a diagram illustrating a hardware architecture according to an embodiment of the present disclosure;
FIG. 5 is a schematic flow chart of a configuration bloom filter provided by an embodiment of the present disclosure;
FIG. 6 is a schematic diagram of a bloom filter configured corresponding to an update period according to an embodiment of the present disclosure;
FIG. 7 is a schematic diagram of a bloom filter corresponding to an update period and hash value settings provided by an embodiment of the present disclosure;
FIG. 8 is a schematic diagram of determining identification information of a bloom filter during data update according to an embodiment of the present disclosure;
FIG. 9 is a schematic diagram of determining identification information of a bloom filter when a query is updated according to an embodiment of the present disclosure;
fig. 10 is a block diagram illustrating a structure of a data updating apparatus according to an embodiment of the present disclosure;
fig. 11 is a block diagram of an electronic device according to an embodiment of the present disclosure.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are some, but not all embodiments of the present disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments disclosed herein without making any creative effort, shall fall within the protection scope of the present disclosure.
The technical scheme provided by the embodiment of the disclosure can be applied to any field needing data storage and data updating, and the data can be any type of data.
Fig. 1 is a schematic view of an application scenario provided in the embodiment of the present disclosure. As shown in fig. 1, a plurality of computing nodes are provided in a distributed system, each computing node may communicate with an external storage device in a wired or wireless manner, the external storage device may be used to store data, and the computing node updates data cached by itself through interaction with the external storage device, so that the data cached by itself is consistent with original data in the external storage device.
The computing nodes can also communicate with the terminal device, taking the data as video data as an example, the video data is stored in an external storage device, and after each computing node caches the video data, the cached video data can be used for interacting with the terminal device, for example, a video is sent to the terminal device for playing.
When any computing node determines that certain video data changes, for example, when the state of a video changes to be limited to play, the changed video data needs to be updated to an external storage device, and then other computing nodes need to update the video data cached by themselves correspondingly, so as to avoid the situation of data inconsistency.
Since a distributed system often needs to process massive data, and the data is updated frequently, an efficient data updating method is needed to achieve final consistency of the data.
In some techniques, a short expiration time may be set for cached data within each compute node, and when a compute node detects that the data has expired when querying the data, the locally cached data is reloaded and updated from an external storage device. The disadvantage of this scheme is that when the cached data is invalid, the query operation is transferred to the external storage device, which causes a large burden and increases the query delay.
In other techniques, a background thread may be used to poll raw data from an external storage device and update the local cache. The disadvantage of this solution is that all data can only be updated periodically and in full, while some data are not actually updated in the external storage device, and repeating full updates will cause more additional network load and stress on the external storage device.
In view of this, the embodiments of the present disclosure provide a data updating method, where a computing node may calculate a corresponding hash value according to an ID of updated data, and configure a corresponding bloom filter based on the hash value, and other computing nodes may determine an ID of data that needs to be updated according to the bloom filter, and obtain updated data corresponding to the data IDs from an external storage device, so as to construct the bloom filter by using the updated data ID to transmit update information, and do not need to update the data in a one-by-one comparison manner, thereby improving data updating efficiency and reducing communication cost.
Some embodiments of the disclosure are described in detail below with reference to the accompanying drawings. The features of the embodiments and examples described below may be combined with each other without conflict between the embodiments.
Fig. 2 is a schematic flow chart of a data updating method according to an embodiment of the present disclosure. The method of the present embodiment may be applied to a first computing node in a distributed system. As shown in fig. 2, the data updating method may include:
step 201, determining a data ID corresponding to at least one updated data.
Alternatively, the data may be stored in a data storage device, such as the external storage device shown in FIG. 1. The first computing node may update data in the data storage device according to business needs.
Each data is associated with a data ID (IDentity) by which the corresponding data can be found. Optionally, a storage space may be allocated to each data, the storage space corresponding to the data may be determined by the data ID, and the data corresponding to the data ID may be read from the storage space.
In this step, at least one data ID that is updated may be determined. For the sake of simplicity, the updated data ID described in the embodiments of the present disclosure may refer to a data ID corresponding to the updated data, rather than indicating that the data ID itself is updated.
Step 202, determining a hash value corresponding to each data ID according to a preset rule.
Wherein each data ID may correspond to at least one hash value. Optionally, determining the hash value corresponding to each data ID according to a preset rule may include: and for each data ID, calculating the data ID through a plurality of preset hash functions to obtain a plurality of corresponding hash values.
The hash function may be a hash function, and can convert an input with an arbitrary length into an output (i.e., a hash value) with a fixed length, thereby implementing a compression mapping. In this embodiment, the hash function may be constructed by any method, such as direct addressing, square-sum method, remainder-dividing method, and the like.
For each data ID corresponding to updated data, K hash values of the data ID may be calculated using K hash functions: h is0~hk-1. Wherein K is an integer of 1 or more.
Optionally, the value space of the hash value may be constrained, for example [ 0-4095 ], so that 512-byte binary vector storage may be used. The data ID can be compressed to a smaller storage space by a hash function, thereby simplifying the overall size of space required for the data update process.
It should be noted that, based on the same hash function, if two hash values are not identical, the original inputs of the two hash values, i.e. the data IDs, are also not identical. But if two hash values are identical, it is not necessary to say that the original inputs of the two hash values are identical.
Step 203, configuring a bloom filter according to the hash value corresponding to each data ID, so that the second computing node in the distributed system determines the data to be updated according to the bloom filter.
Wherein a bloom filter may be used to determine whether an element is in a set. The bloom filter includes a plurality of bits, and configuring the bloom filter may refer to updating the corresponding bits of the bloom filter according to the hash value.
Specifically, the initial value of each bit of the bloom filter may be a first value, and when the corresponding bit of the bloom filter is configured according to the hash value, the H-th bit in the bloom filter corresponding to the data ID may be updated to a second value according to each hash value corresponding to the data ID, where the value of H traverses the hash value corresponding to the data ID, that is, the value of H is H0~hk-1
For convenience of description, in the embodiments of the present disclosure, the first numerical value is 0, and the second numerical value is 1.
Fig. 3 is a schematic diagram of a bloom filter according to an embodiment of the present disclosure. As shown in FIG. 3, the bloom filter has 10 bits, each bit being set to 0 at initialization. When the bloom filter is configured according to the data ID, the corresponding position of the bloom filter may be set to 1 according to the hash value of the data ID.
Assuming that two data IDs exist and are respectively marked as x and y, the preset rule comprises three hash functions, and after each data ID is calculated, three hash values can be obtained, wherein the value space of the hash values is 0-9 and is consistent with the size of the storage space of the bloom filter.
When the bloom filter is configured according to the obtained hash value, assuming that three hash values corresponding to the data IDx are 1, 2, and 5, respectively, bits 1, 2, and 5 of the bloom filter may be set to 1; the hash values corresponding to the data IDy are 5, 7, and 8, respectively, then bits 5, 7, and 8 of the bloom filter may be set to 1.
Other compute nodes in the distributed system, for example, the second compute node, may determine whether the data in the memory cache of the second compute node needs to be updated according to the configured bloom filter.
Optionally, the second computing node may obtain a data ID corresponding to at least one data in the local memory cache; determining a hash value corresponding to each data ID according to a preset rule; and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter.
Specifically, it may be checked whether the numerical values of the positions corresponding to the hash values in the bloom filter are all 1, if all the numerical values are 1, the data needs to be updated, and if one of the numerical values is not 1, the data does not need to be updated.
For example, assuming that the data ID in the memory cache of the second computing node includes y, the hash value corresponding to y is calculated to be 5, 7, and 8 by using the hash function used when the bloom filter is configured, and then the hash value is searched in the bloom filter, and it is found that bits 5, 7, and 8 of the bloom filter are all 1, which indicates that the data corresponding to y may be updated.
Assuming that the data ID in the memory cache of the second computing node further includes z, calculating to obtain hash values corresponding to z as 5, 7, and 9 by using a hash function used when the bloom filter is configured, and then searching in the bloom filter, finding that bits 5 and 7 of the bloom filter are 1, but bit 9 is 0, which indicates that data corresponding to z is not updated.
For a certain cached data ID, when it is determined by the bloom filter that updating is required, the updated data may be obtained from the external storage device, and the data in the memory cache may be updated according to the obtained data, so that the data in the memory cache of the second computing node and the data of the external storage device are kept consistent.
In addition, there is a possibility that the hash value of a certain data ID that is not updated is 1 in the corresponding bit of the bloom filter. For example, assuming that data IDv exists, the hash values corresponding to v are calculated to be 1, 5, and 7 by using the hash function described above, whereas the numbers 1, 5, and 7 in the bloom filter are all 1, but the data v is not updated in reality, and the reason why the numbers 1, 5, and 7 are all 1 is that the data IDx and y are updated. In this case, it may happen that data which is not updated is mistaken for being updated, and thus the data is requested from the external storage device, but this does not affect the consistency of the data and does not cause much extra burden, and therefore such an error is acceptable.
In the data updating method provided by this embodiment, a data ID corresponding to at least one updated data may be determined by a first computing node in a distributed system, a hash value corresponding to each data ID is determined according to a preset rule, and a bloom filter is configured according to the hash value corresponding to each data ID, so that a second computing node in the distributed system may determine data to be updated according to the bloom filter, thereby updating corresponding data cached in the second computing node, and a bloom filter may be constructed by using the data ID to implement transmission of update information without updating the data in a manner of comparison one by one, thereby improving efficiency of data updating and reducing communication cost.
Based on the technical solution provided in the foregoing embodiment, optionally, the bloom filter may be directly transmitted to the second computing node by the first computing node, or the bloom filter may be stored in another device, the first computing node may update the bloom filter in the other device, and the second computing node may read the bloom filter in the other device.
Alternatively, the bloom filter may be stored in the first storage device. Correspondingly, configuring a bloom filter according to the hash value corresponding to each data ID may include: sending a configuration instruction to the first storage device, wherein the configuration instruction comprises identification information of the bloom filter to be updated and a hash value of the data ID corresponding to the bloom filter, so that the first storage device configures the corresponding bloom filter according to the configuration instruction.
By storing the bloom filter in the external first storage device, the burden of a computing node can be effectively reduced, and the efficiency and accuracy of the bloom filter configuration can be improved.
The first storage device may be a Remote Dictionary service (Redis) cluster, and the configuration instruction may be a SetButt instruction, so that corresponding bits of the bloom filter may be updated quickly and accurately according to the bit operation instruction.
The Redis cluster can comprise a plurality of Redis nodes, each Redis node can be deployed with a Redis instance, so that the read-write operation of the bloom filter can be dispersed to the Redis instances, and the read-write efficiency of the bloom filter is further improved.
Alternatively, the bit manipulation instruction may provide 3 parameters: key, offset, and value correspond to the identification information, hash value, and 1 of the bloom filter, respectively, and indicate that the position of the bloom filter corresponding to the hash value is set to 1. The computing node sends the bit operation instruction to the Redis cluster, and the configuration of the bloom filter is realized through the Redis cluster, so that the bloom filter can indicate the updating condition of the data.
Furthermore, each computing node in the distributed system can update data, and simultaneously, the bloom filter can be configured according to the updated data ID, so that the common update maintenance of the data is realized in a mode of cooperation of each computing node.
Fig. 4 is a schematic diagram of a hardware architecture according to an embodiment of the disclosure. As shown in fig. 4, the distributed system includes a plurality of computing nodes, the external storage device includes a first storage device and a second storage device, and each computing node in the distributed system can communicate with the first storage device and the second storage device, wherein the first storage device is used for storing update information, namely bloom filters, and the second storage device is used for storing data, such as video data.
Each computing node may update the data stored in the second storage device, and after updating the data of the external storage device, the computing node may update the bloom filter in the first storage device according to the updated data ID, so that other computing nodes may determine the updated data ID according to the updated bloom filter, and then obtain the updated data corresponding to the data ID from the first storage device.
In this implementation manner, the first computing node may perform steps 201 to 203, and further obtain a data ID corresponding to at least one cached data, determine a hash value corresponding to each data ID according to a preset rule, and determine, for each cached data, whether the data needs to be updated according to the hash value corresponding to the data ID and a bloom filter configured by other computing nodes in the distributed system, that is, the first computing node may further implement the function implemented by the second computing node in the foregoing embodiment.
Specifically, each computing node may include: the device comprises an updating configuration component and an updating query component, wherein the updating configuration component is used for configuring a bloom filter according to a data ID corresponding to updated data, and the updating query component is used for querying whether the cached data needs to be updated according to the bloom filter.
Through the scheme shown in fig. 4, each computing node in the distributed system can maintain update information together, so that the update efficiency of data is improved, and the bloom filter and the data are stored in the first storage device and the second storage device respectively, so that the storage/query of the updated data ID and the data corresponding to the update can be realized by interacting with different external storage devices respectively, the burden of the external storage devices is reduced, and the efficiency of data storage is further improved.
On the basis of the technical solutions provided by the above embodiments, optionally, the corresponding bloom filter may be configured through an update cycle.
Fig. 5 is a schematic flow chart of configuring a bloom filter according to an embodiment of the present disclosure. As shown in fig. 5, configuring a bloom filter according to the hash value corresponding to each data ID may include:
step 501, for each data ID, determining an update period in which the corresponding data is updated, and determining a bloom filter corresponding to the data ID according to the update period.
The update period may be set according to actual needs, for example, 10 seconds is an update period. Each cycle may correspond to one or more bloom filters.
Fig. 6 is a schematic diagram of a bloom filter set corresponding to an update period according to an embodiment of the present disclosure. As shown in fig. 6, for each update cycle, a bloom filter may be provided. The ith update cycle corresponds to bloom filter i. And the value of i is 1 to L, L is an integer greater than or equal to 1, and L represents the number of the currently stored cycles.
For data updated in a certain period, the corresponding data ID may be used to configure the bloom filter corresponding to the period. Since multiple data may be updated in a cycle, there may be multiple data IDs used to configure the bloom filter in the cycle.
Step 502, configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID.
Optionally, the initial value of each bit of the bloom filter is a first value. After data is updated, for each data ID, setting the H-th bit in the bloom filter corresponding to the data ID to be a second value according to a plurality of hash values corresponding to the data ID, where a value of H traverses the plurality of hash values corresponding to the data ID.
In the case where the first value is 0 and the second value is 1, each bit of the bloom filter may be set to 0 when initializing the bloom filter.
Assuming that 3 pieces of data are updated in a certain period, the bloom filter corresponding to the period is the bloom filter 1, the data IDs of the three pieces of updated data are x, y, and z, respectively, and for each data ID, the corresponding hash value is calculated by a plurality of hash functions, respectively.
Assume that the hash function includes H1、H2、H3For x, H needs to be calculated1(x)、H2(x)、H3(x) Similarly, for y and z, H needs to be calculated1(y)、H2(y)、H3(y) and H1(z)、H2(z)、H3(z) after obtaining these hash values, configure the corresponding bits of the corresponding bloom filter, i.e., H-th of bloom filter 11(x) Bit, H2(x) Bit, H3(x) Bit, H1Position (y) and position H2Position (y) and position H3(y) position and H1Position (z), position H2Position (z), position H3The (z) bits are all set to 1.
Correspondingly, the determining, by the first computing node, for each cached data, whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system may include: determining a bloom filter corresponding to at least one history updating period; and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one historical updating period.
Specifically, the first computing node may query whether the data needs to be updated once every preset query period. Optionally, the length of the query period may be equal to the length of the update period. Correspondingly, the at least one historical updating period is the last updating period, and the first computing node determines whether the data which needs to be updated exists or not according to the bloom filter every other updating period, so that the data can be inquired as soon as possible after being updated, the cost and the accuracy of data inquiry are considered, and the data consistency is ensured.
In other alternative implementations, the length of the query period may be greater than or less than the length of the update period. When the length of the query period is greater than the length of the update period, the at least one history update period may be a plurality of update periods, for example, the update period is 10 seconds, and the query period is 20 seconds, and then the at least one history update period may include the last two update periods.
After determining the bloom filter corresponding to at least one history updating period, for any locally cached data, it may be determined whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history updating period.
Specifically, it may be found whether the positions corresponding to the hash values of the data IDs in the bloom filters are all 1 according to each determined bloom filter, and if the corresponding positions in any one of the bloom filters are all 1, it indicates that the corresponding data needs to be updated.
Through the scheme, the bloom filter corresponding to the data ID can be determined according to the update cycle of the time when each data is updated during data updating, the corresponding bloom filter is configured according to the hash value corresponding to each data ID, the bloom filter corresponding to at least one historical update cycle is correspondingly determined during data query, and the data needing to be updated is determined according to the at least one historical update cycle, so that the data updating and the data query are dispersed in each time cycle, the real-time performance of the data updating and the query is improved, the read-write operation of the bloom filter can be dispersed to a plurality of Redis instances, and the read-write efficiency of the bloom filter is improved.
In addition, the scheme provided by the embodiment of the disclosure can also effectively reduce the communication cost of updating the cache. For example, 10 ten thousand data are stored in the local cache, 1000 data are updated in each period, 5 storage spaces of 512 bytes are used for storing the bloom filter, each period only needs to read 5 storage spaces of 512 bytes, and then 1000 data which are detected to be updated are updated.
Based on the technical solutions provided by the above embodiments, each update period may optionally correspond to a plurality of bloom filters. Correspondingly, determining the bloom filter corresponding to the data ID according to the update cycle may include: and determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the updating period according to the hash value corresponding to the data ID.
Fig. 7 is a schematic diagram of a bloom filter corresponding to an update period and a hash value setting according to an embodiment of the present disclosure. As shown in fig. 7, for each update cycle, a plurality of bloom filters may be provided. The ith update period corresponds to the bloom filter ij. The value of i is 1 to L, L is an integer greater than or equal to 1, L represents the number of currently stored cycles, the value of j is 1 to N, N is an integer greater than or equal to 2, and N represents the number of bloom filters in each update cycle.
In the case that one update cycle corresponds to a plurality of bloom filters, the data ID may be mapped to different bloom filters according to the hash value of the data ID, so that all the data IDs in one update cycle may be distributed to different bloom filters.
For example, data corresponding to data IDx, y, and z are all updated in the same update cycle, and x and y are mapped to bloom filter 11 and z is mapped to bloom filter 12 according to hash values, so that hash values corresponding to x and y may be used to configure bloom filter 11 and hash values corresponding to z may be used to configure bloom filter 12.
Correspondingly, when the query is updated, determining a bloom filter corresponding to at least one historical update period may include: and for each history updating period, determining a plurality of bloom filters corresponding to the history updating period according to the history updating period and the number of the bloom filters in each updating period. Determining whether the data corresponding to the data ID needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history update period may include: for each history updating period, searching a bloom filter corresponding to the data ID from bloom filters corresponding to the data ID in the history updating period according to the hash value corresponding to the data ID; and determining whether the data corresponding to the data ID needs to be updated according to the searched bloom filter. Thus, when the bloom filter is read, if the first storage device has a plurality of instances to provide services, the read operation can be dispersed, and the pressure of each storage instance can be reduced.
Further, when data is updated, determining a bloom filter corresponding to the data ID from among the bloom filters corresponding to the update period according to the hash value corresponding to the data ID may include: determining the updating time of the data corresponding to the data ID, and performing integer division operation on the updating time to the duration of an updating period to obtain period information, wherein the period information is used for indicating the updating period of the updating time of the data; performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information; and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
Assuming that the current timestamp corresponding to a certain data is T when the certain data is updated, the current timestamp is obtained through K hash functions H0、H2、…、Hk-1The calculated hash values are h respectively0To hk-1Then the hash value h may be combined0To hk-1And T as a parameter, generating a key, which is the identification information of the bloom filter.
Specifically, the period information T/M and the sequence information h can be set0Splicing by% N to obtain key. Where M is the duration of an update period, e.g., 10 seconds, N is the number of bloom filters corresponding to an update period, and h0For a first hash value corresponding to the data ID, i.e. by a first hash function H0The resulting hash value.
The integer division of T over M is performed, and the obtained period information can be used to indicate the update period at which the data is updated, i.e. the data is updated at the next cycle. h is0The purpose of% N is that multiple data IDs updated in the same period will be scattered evenly over multiple keys.
When the corresponding update period sum h of some two data IDs0All of them are the same, their corresponding keys are the same, i.e., they correspond to the same bloom filter; when the updating periods corresponding to two data IDs are different, the corresponding keys are also different; when the update period of some two data IDs is the same, but h0When they are not the same, their corresponding keys may be the same or different.
T/M can scatter data IDs over time, h0% N may be used in accordance withThe hash value further breaks up the data ID, thereby achieving a decentralized read-write operation. Of course, other methods may be chosen similarly, such as replacing h with other hash values0Or, h is replaced by the average of the hash values0And the like.
Optionally, the splicing the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID may include: and splicing a preset Prefix (Prefix) with the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
Fig. 8 is a schematic diagram of determining identification information of a bloom filter during data update according to an embodiment of the present disclosure. As shown in fig. 8, the identification information key is formed by splicing three parts, namely, a preset prefix, T/M and h0%N。
The preset prefix is used to indicate that the identification information is identification information of a bloom filter, that is, the identification information is identification information of the bloom filter, but not identification information of other variables. The specific prefix may be a fixed value, and may be set according to actual needs, so as to identify the bloom filter, and avoid collision with the identifiers of other information. Optionally, the preset prefix may be different from prefixes of other variables in the distributed system except for the bloom filter.
At the time of data updating, for each updated data ID, the storage space of the key corresponding to the updated data ID is found, and then the position corresponding to the plurality of hash values of the data ID is set to 1, thereby completing the configuration of the bloom filter.
Specifically, the value of the key in the first storage device, that is, the h-th of the corresponding bloom filter, may be obtained using the bit operation instruction of the first storage device0Bit, h1H at position …k-1The value of the bit is set to 1.
When updating the query, according to the hash value corresponding to the data ID, searching for the bloom filter corresponding to the data ID from the corresponding bloom filters in the history update period, which may include: determining corresponding period information according to the historical updating period; performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information; and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
Further, if a preset prefix is added to the key of each bloom filter during data updating, a corresponding preset prefix may also be added to the obtained identification information during updating query.
Optionally, determining corresponding period information according to the history update period includes: performing integer division operation on the duration of one updating period by the current time; subtracting the offset from the result of the integer division operation to obtain corresponding period information; wherein the offset is determined by a number of cycles of an interval between the history update period and a current update period.
Fig. 9 is a schematic diagram of determining identification information of a bloom filter when a query is updated according to an embodiment of the present disclosure. As shown in FIG. 9, for each cached data, it can be passed through the preset prefix, T/M-O and h0And splicing the three parts of the% N to obtain the identification information of the corresponding bloom filter.
Wherein T is the current timestamp corresponding to the query, M is the update period, h0O is an offset for the first hash value corresponding to the data ID, and may be 1 when the polling operation is performed every update period M.
By the way shown in fig. 9, the identification information of the corresponding bloom filter may be calculated, and the corresponding bloom filter may be found by the identification information, and it is determined whether the positions corresponding to the hash values of the data IDs are all 1, thereby determining whether the data needs to be updated.
In practical applications, the query operation may be performed every update period M, and then N keys may be constructed in each query. Specifically, 1 to N may be traversed by presetting prefixes Prefix (the prefixes are optional), T/M-O and N as parameters, and obtaining N keys. The construction mode of the key is consistent with that of the data updating, so that all keys in the previous updating period are obtained.
After all the keys in the previous update period are obtained, the bloom filters corresponding to the N keys may be read from the first storage device, and the hash table is used to maintain the correspondence between the N keys and the bloom filters. The N keys and the N bloom filters are in a one-to-one correspondence relationship, that is, the ith key corresponds to the ith bloom filter, and the value of i is from 1 to N.
Then, traversing all the data in the memory cache, and for one data ID, detecting whether the data ID is updated or not by the following method: computing K hash values h using K hash functions0~hk-1The K hash functions are consistent with the K hash functions used to configure the bloom filter during data update, and then pass through T and h0And acquiring a corresponding key, inquiring a corresponding bloom filter from the hash table, and judging whether the data needs to be updated according to the corresponding bloom filter.
In particular, the h-th can be detected in the bloom filter0Bit, h1Bit …, hk-1If the bit values are all 1, it indicates that the data is likely to be updated in the second storage device, and at this time, the data needs to be reloaded from the second storage device and the local cache needs to be updated. If at least one bit is 0, it must not be updated and the data does not need to be reloaded.
By the method, the key of the bloom filter can be generated by using a specific mechanism, so that the read-write operation is uniformly scattered to a plurality of first storage device instances, and the write pressure and the total storage amount of each first storage device instance are reduced.
On the basis of the technical solution provided by the foregoing embodiment, optionally, an update cleaning component may be provided in the first computing node, and is used to clean an expired bloom filter.
In an alternative, the bloom filter may be deleted at every predetermined cleaning cycle before a predetermined time. The cleaning period can be set according to actual needs, and can be 1 day, for example.
Specifically, the update cleaning component may periodically perform cleaning operation, delete the update information before the preset time in the first storage device, and implement active cleaning of the bloom filter.
In another alternative, after the bloom filter is configured, the expiration time corresponding to the bloom filter may be sent to a first storage device storing the bloom filter, so that the first storage device automatically deletes the bloom filter according to the expiration time.
Specifically, if the first storage device supports an expiration deletion mechanism, for example, the Redis cluster provides expiration time for expire instruction setting data, an expiration time may be set for a corresponding key after the configuration component completes configuration of the bloom filter, so that the first storage device deletes the key by itself, and the burden of the computing node is reduced.
In other alternative implementations, both schemes may be used simultaneously or alternately.
By the scheme, the outdated bloom filter can be cleaned in time, the occupation of the storage space of the first storage device is reduced, and the storage performance of the first storage device is optimized.
Optionally, in the distributed system, an update configuration component, an update query component, and an update cleaning component may be set in each computing node; alternatively, some computing nodes may also be provided with only some components, for example, only the update configuration component and the update query component are provided, and the update cleaning component is not provided, which is not limited by the embodiments of the present disclosure.
Corresponding to the data updating method provided by the above embodiment, the embodiment of the present disclosure further provides a data updating apparatus, which is applied to a first computing node in a distributed system. Fig. 10 is a block diagram of a data updating apparatus according to an embodiment of the present disclosure. For ease of illustration, only portions that are relevant to embodiments of the present disclosure are shown. Referring to fig. 10, the apparatus may include:
a determining module 1001, configured to determine a data ID corresponding to at least one updated data;
the calculating module 1002 is configured to determine, according to a preset rule, a hash value corresponding to each data ID;
a configuration module 1003, configured to configure a bloom filter according to the hash value corresponding to each data ID, so that the second computing node in the distributed system determines, according to the bloom filter, data that needs to be updated.
In an embodiment of the present disclosure, the configuration module 1003 is specifically configured to:
for each data ID, determining an updating period of the corresponding data updating time, and determining a bloom filter corresponding to the data ID according to the updating period;
and configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID.
In one embodiment of the present disclosure, each update period corresponds to a plurality of bloom filters;
correspondingly, when determining the bloom filter corresponding to the data ID according to the update cycle, the configuration module 1003 is specifically configured to:
and determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the updating period according to the hash value corresponding to the data ID.
In an embodiment of the present disclosure, when determining, according to the hash value corresponding to the data ID, a bloom filter corresponding to the data ID from among the plurality of bloom filters corresponding to the update period, the configuration module 1003 is specifically configured to:
determining the updating time of the data corresponding to the data ID, and performing integer division operation on the updating time to the duration of an updating period to obtain period information, wherein the period information is used for indicating the updating period of the updating time of the data;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
In an embodiment of the present disclosure, when the configuration module 1003 splices the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID, specifically, the configuration module is configured to:
splicing a preset prefix with the period information and the sequence information to obtain identification information of the bloom filter corresponding to the data ID;
the preset prefix is used for indicating that the identification information is identification information of a bloom filter.
In one embodiment of the present disclosure, the initial value of each bit of the bloom filter is a first value;
when determining the hash value corresponding to each data ID according to the preset rule, the configuration module 1003 is specifically configured to: for each data ID, calculating the data ID through a plurality of preset hash functions to obtain a plurality of corresponding hash values;
the configuration module 1003, when configuring the corresponding bit of the corresponding bloom filter according to the hash value corresponding to each data ID, is specifically configured to: and for each data ID, setting the H-th bit in the bloom filter corresponding to the data ID as a second numerical value according to a plurality of hash values corresponding to the data ID, wherein the value of H traverses the plurality of hash values corresponding to the data ID.
In one embodiment of the present disclosure, the bloom filter is stored in a first storage device;
the configuration module 1003 is specifically configured to: sending a configuration instruction to the first storage device, wherein the configuration instruction comprises identification information of the bloom filter to be updated and a hash value of the data ID corresponding to the bloom filter, so that the first storage device configures the corresponding bloom filter according to the configuration instruction.
In an embodiment of the present disclosure, the first storage device is a Redis cluster, and the configuration instruction is a bit operation instruction.
In an embodiment of the present disclosure, the configuration module 1003 is further configured to:
acquiring a data ID corresponding to at least one cached data;
determining a hash value corresponding to each data ID according to a preset rule;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system.
In an embodiment of the present disclosure, the configuration module 1003, when determining, for each cached data, whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system, is specifically configured to:
determining a bloom filter corresponding to at least one history updating period;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one historical updating period.
In one embodiment of the present disclosure, each update period corresponds to a plurality of bloom filters;
when determining the bloom filter corresponding to at least one history update period, the configuration module 1003 is specifically configured to:
for each history updating period, determining a plurality of bloom filters corresponding to the history updating period according to the history updating period and the number of the bloom filters in each updating period;
correspondingly, determining whether the data corresponding to the data ID needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history updating period includes:
for each history updating period, searching a bloom filter corresponding to the data ID from bloom filters corresponding to the data ID in the history updating period according to the hash value corresponding to the data ID;
and determining whether the data corresponding to the data ID needs to be updated according to the searched bloom filter.
In an embodiment of the present disclosure, when searching for the bloom filter corresponding to the data ID from the corresponding bloom filters in the history update period according to the hash value corresponding to the data ID, the configuration module 1003 is specifically configured to:
determining corresponding period information according to the historical updating period;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
In an embodiment of the present disclosure, when determining the corresponding period information according to the history update period, the configuration module 1003 is specifically configured to:
performing integer division operation on the duration of one updating period by the current time;
subtracting the offset from the result of the integer division operation to obtain corresponding period information;
wherein the offset is determined by a number of cycles of an interval between the history update period and a current update period.
In one embodiment of the present disclosure, the at least one history update period is a last update period, and the first computing node determines whether there is data that needs to be updated according to a bloom filter every other update period.
In an embodiment of the present disclosure, the configuration module 1003 is further configured to:
deleting the bloom filter before a preset time every other preset cleaning period; and/or the presence of a gas in the gas,
after the bloom filter is configured, sending an expiration time corresponding to the bloom filter to a first storage device storing the bloom filter, so that the first storage device automatically deletes the bloom filter according to the expiration time.
The apparatus provided in this embodiment may be used to implement the technical solutions of the above method embodiments, and the implementation principles and technical effects are similar, which are not described herein again.
Fig. 11 is a block diagram of an electronic device according to an embodiment of the present disclosure. Referring to fig. 11, the electronic device 1100 may be a terminal device or a server. Among them, the terminal Device may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a Digital broadcast receiver, a Personal Digital Assistant (PDA), a tablet computer (PAD), a Portable Multimedia Player (PMP), a car terminal (e.g., car navigation terminal), etc., and a fixed terminal such as a Digital TV, a desktop computer, etc. The electronic device shown in fig. 11 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 11, the electronic device 1100 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 1101, which may perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1102 or a program loaded from a storage means 1108 into a Random Access Memory (RAM) 1103. In the RAM 1103, various programs and data necessary for the operation of the electronic device 1100 are also stored. The processing device 1101, the ROM1102, and the RAM 1103 are connected to each other by a bus 1104. An input/output (I/O) interface 1105 is also connected to bus 1104.
Generally, the following devices may be connected to the I/O interface 1105: input devices 1106 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 1107 including, for example, a LiquID Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 1108, including, for example, magnetic tape, hard disk, etc.; and a communication device 1109. The communication means 1109 may allow the electronic device 1100 to communicate wirelessly or wiredly with other devices to exchange data. While fig. 11 illustrates an electronic device 1100 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication device 1109, or installed from the storage device 1108, or installed from the ROM 1102. The computer program, when executed by the processing device 1101, performs the above-described functions defined in the methods of the embodiments of the present disclosure.
It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.
The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to perform the methods shown in the above embodiments.
Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of Network, including a Local Area Network (LAN) or a WIDe Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. Where the name of an element does not in some cases constitute a limitation on the element itself.
The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
In a first aspect, according to one or more embodiments of the present disclosure, there is provided a data updating method applied to a first computing node in a distributed system, the method including:
determining a data ID corresponding to the updated at least one data;
determining a hash value corresponding to each data ID according to a preset rule;
and configuring a bloom filter according to the hash value corresponding to each data ID, so that a second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
According to one or more embodiments of the present disclosure, configuring a bloom filter according to a hash value corresponding to each data ID includes:
for each data ID, determining an updating period of the corresponding data updating time, and determining a bloom filter corresponding to the data ID according to the updating period;
and configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID.
According to one or more embodiments of the present disclosure, each update period corresponds to a plurality of bloom filters;
correspondingly, determining the bloom filter corresponding to the data ID according to the update cycle includes:
and determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the updating period according to the hash value corresponding to the data ID.
According to one or more embodiments of the present disclosure, determining, according to the hash value corresponding to the data ID, a bloom filter corresponding to the data ID from among the plurality of bloom filters corresponding to the update period includes:
determining the updating time of the data corresponding to the data ID, and performing integer division operation on the updating time to the duration of an updating period to obtain period information, wherein the period information is used for indicating the updating period of the updating time of the data;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
According to one or more embodiments of the present disclosure, the splicing the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID includes:
splicing a preset prefix with the period information and the sequence information to obtain identification information of the bloom filter corresponding to the data ID;
the preset prefix is used for indicating that the identification information is identification information of a bloom filter.
According to one or more embodiments of the present disclosure, the initial value of each bit of the bloom filter is a first numerical value;
according to a preset rule, determining a hash value corresponding to each data ID, wherein the hash value comprises the following steps: for each data ID, calculating the data ID through a plurality of preset hash functions to obtain a plurality of corresponding hash values;
configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID, including: and for each data ID, setting the H-th bit in the bloom filter corresponding to the data ID as a second numerical value according to a plurality of hash values corresponding to the data ID, wherein the value of H traverses the plurality of hash values corresponding to the data ID.
In accordance with one or more embodiments of the present disclosure, the bloom filter is stored in a first storage device;
configuring a bloom filter according to the hash value corresponding to each data ID, comprising:
sending a configuration instruction to the first storage device, wherein the configuration instruction comprises identification information of the bloom filter to be updated and a hash value of the data ID corresponding to the bloom filter, so that the first storage device configures the corresponding bloom filter according to the configuration instruction.
According to one or more embodiments of the present disclosure, the first storage device is a Redis cluster, and the configuration instruction is a bit operation instruction.
According to one or more embodiments of the present disclosure, the method further comprises:
acquiring a data ID corresponding to at least one cached data;
determining a hash value corresponding to each data ID according to a preset rule;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system.
According to one or more embodiments of the present disclosure, for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system includes:
determining a bloom filter corresponding to at least one history updating period;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one historical updating period.
According to one or more embodiments of the present disclosure, each update period corresponds to a plurality of bloom filters;
determining a bloom filter corresponding to at least one history update period, comprising:
for each history updating period, determining a plurality of bloom filters corresponding to the history updating period according to the history updating period and the number of the bloom filters in each updating period;
correspondingly, determining whether the data corresponding to the data ID needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history updating period includes:
for each history updating period, searching a bloom filter corresponding to the data ID from bloom filters corresponding to the data ID in the history updating period according to the hash value corresponding to the data ID;
and determining whether the data corresponding to the data ID needs to be updated according to the searched bloom filter.
According to one or more embodiments of the present disclosure, searching for the bloom filter corresponding to the data ID from the corresponding bloom filters in the history update period according to the hash value corresponding to the data ID includes:
determining corresponding period information according to the historical updating period;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
According to one or more embodiments of the present disclosure, determining corresponding period information according to a history update period includes:
performing integer division operation on the duration of one updating period by the current time;
subtracting the offset from the result of the integer division operation to obtain corresponding period information;
wherein the offset is determined by a number of cycles of an interval between the history update period and a current update period.
According to one or more embodiments of the present disclosure, the at least one history update period is a last update period, and the first computing node determines whether there is data that needs to be updated according to a bloom filter every other update period.
According to one or more embodiments of the present disclosure, the method further comprises:
deleting the bloom filter before a preset time every other preset cleaning period; and/or the presence of a gas in the gas,
after the bloom filter is configured, sending an expiration time corresponding to the bloom filter to a first storage device storing the bloom filter, so that the first storage device automatically deletes the bloom filter according to the expiration time.
In a second aspect, according to one or more embodiments of the present disclosure, there is provided a data update apparatus applied to a first computing node in a distributed system, the apparatus including:
the determining module is used for determining a data ID corresponding to the updated at least one data;
the calculation module is used for determining hash values corresponding to the data IDs according to preset rules;
and the configuration module is used for configuring the bloom filter according to the hash value corresponding to each data ID, so that the second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
According to one or more embodiments of the present disclosure, the configuration module 1003 is specifically configured to:
for each data ID, determining an updating period of the corresponding data updating time, and determining a bloom filter corresponding to the data ID according to the updating period;
and configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID.
According to one or more embodiments of the present disclosure, each update period corresponds to a plurality of bloom filters;
correspondingly, when determining the bloom filter corresponding to the data ID according to the update cycle, the configuration module is specifically configured to:
and determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the updating period according to the hash value corresponding to the data ID.
According to one or more embodiments of the present disclosure, when determining, according to the hash value corresponding to the data ID, a bloom filter corresponding to the data ID from among the plurality of bloom filters corresponding to the update period, the configuration module is specifically configured to:
determining the updating time of the data corresponding to the data ID, and performing integer division operation on the updating time to the duration of an updating period to obtain period information, wherein the period information is used for indicating the updating period of the updating time of the data;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
According to one or more embodiments of the present disclosure, when the configuration module splices the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID, the configuration module is specifically configured to:
splicing a preset prefix with the period information and the sequence information to obtain identification information of the bloom filter corresponding to the data ID;
the preset prefix is used for indicating that the identification information is identification information of a bloom filter.
According to one or more embodiments of the present disclosure, the initial value of each bit of the bloom filter is a first numerical value;
when determining the hash value corresponding to each data ID according to the preset rule, the configuration module is specifically configured to: for each data ID, calculating the data ID through a plurality of preset hash functions to obtain a plurality of corresponding hash values;
the configuration module is specifically configured to, when configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID: and for each data ID, setting the H-th bit in the bloom filter corresponding to the data ID as a second numerical value according to a plurality of hash values corresponding to the data ID, wherein the value of H traverses the plurality of hash values corresponding to the data ID.
In accordance with one or more embodiments of the present disclosure, the bloom filter is stored in a first storage device;
the configuration module is specifically configured to: sending a configuration instruction to the first storage device, wherein the configuration instruction comprises identification information of the bloom filter to be updated and a hash value of the data ID corresponding to the bloom filter, so that the first storage device configures the corresponding bloom filter according to the configuration instruction.
According to one or more embodiments of the present disclosure, the first storage device is a Redis cluster, and the configuration instruction is a bit operation instruction.
In accordance with one or more embodiments of the present disclosure, the configuration module is further configured to:
acquiring a data ID corresponding to at least one cached data;
determining a hash value corresponding to each data ID according to a preset rule;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system.
According to one or more embodiments of the present disclosure, when determining, for each cached data, according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system, whether the data needs to be updated, the configuration module is specifically configured to:
determining a bloom filter corresponding to at least one history updating period;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one historical updating period.
According to one or more embodiments of the present disclosure, each update period corresponds to a plurality of bloom filters;
when determining the bloom filter corresponding to at least one history update period, the configuration module is specifically configured to:
for each history updating period, determining a plurality of bloom filters corresponding to the history updating period according to the history updating period and the number of the bloom filters in each updating period;
correspondingly, determining whether the data corresponding to the data ID needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history updating period includes:
for each history updating period, searching a bloom filter corresponding to the data ID from bloom filters corresponding to the data ID in the history updating period according to the hash value corresponding to the data ID;
and determining whether the data corresponding to the data ID needs to be updated according to the searched bloom filter.
According to one or more embodiments of the present disclosure, when the configuration module searches for the bloom filter corresponding to the data ID from the corresponding bloom filters in the history update period according to the hash value corresponding to the data ID, the configuration module is specifically configured to:
determining corresponding period information according to the historical updating period;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
According to one or more embodiments of the present disclosure, when determining the corresponding period information according to the history update period, the configuration module is specifically configured to:
performing integer division operation on the duration of one updating period by the current time;
subtracting the offset from the result of the integer division operation to obtain corresponding period information;
wherein the offset is determined by a number of cycles of an interval between the history update period and a current update period.
According to one or more embodiments of the present disclosure, the at least one history update period is a last update period, and the first computing node determines whether there is data that needs to be updated according to a bloom filter every other update period.
In accordance with one or more embodiments of the present disclosure, the configuration module is further configured to:
deleting the bloom filter before a preset time every other preset cleaning period; and/or the presence of a gas in the gas,
after the bloom filter is configured, sending an expiration time corresponding to the bloom filter to a first storage device storing the bloom filter, so that the first storage device automatically deletes the bloom filter according to the expiration time.
In a third aspect, according to one or more embodiments of the present disclosure, there is provided an electronic device including: a memory and at least one processor;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the data update method as described above in the first aspect and various possible designs of the first aspect.
In a fourth aspect, according to one or more embodiments of the present disclosure, there is provided a computer-readable storage medium having stored therein computer-executable instructions that, when executed by a processor, implement the data update method as described in the first aspect above and in various possible designs of the first aspect.
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.
Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.
Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims.

Claims (19)

1. A data update method applied to a first computing node in a distributed system, the method comprising:
determining a data ID corresponding to the updated at least one data;
determining a hash value corresponding to each data ID according to a preset rule;
and configuring a bloom filter according to the hash value corresponding to each data ID, so that a second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
2. The method of claim 1, wherein configuring the bloom filter based on the hash value corresponding to each data ID comprises:
for each data ID, determining an updating period of the corresponding data updating time, and determining a bloom filter corresponding to the data ID according to the updating period;
and configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID.
3. The method of claim 2, wherein each update period corresponds to a plurality of bloom filters;
correspondingly, determining the bloom filter corresponding to the data ID according to the update cycle includes:
and determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the updating period according to the hash value corresponding to the data ID.
4. The method of claim 3, wherein determining the bloom filter corresponding to the data ID from the plurality of bloom filters corresponding to the update period according to the hash value corresponding to the data ID comprises:
determining the updating time of the data corresponding to the data ID, and performing integer division operation on the updating time to the duration of an updating period to obtain period information, wherein the period information is used for indicating the updating period of the updating time of the data;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
5. The method according to claim 4, wherein the splicing the period information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID comprises:
splicing a preset prefix with the period information and the sequence information to obtain identification information of the bloom filter corresponding to the data ID;
the preset prefix is used for indicating that the identification information is identification information of a bloom filter.
6. The method of claim 2, wherein each bit of the bloom filter is initialized to a first value;
according to a preset rule, determining a hash value corresponding to each data ID, wherein the hash value comprises the following steps: for each data ID, calculating the data ID through a plurality of preset hash functions to obtain a plurality of corresponding hash values;
configuring corresponding bits of the corresponding bloom filter according to the hash value corresponding to each data ID, including: and for each data ID, setting the H-th bit in the bloom filter corresponding to the data ID as a second numerical value according to a plurality of hash values corresponding to the data ID, wherein the value of H traverses the plurality of hash values corresponding to the data ID.
7. The method of claim 1, wherein the bloom filter is stored in a first storage device;
configuring a bloom filter according to the hash value corresponding to each data ID, comprising:
sending a configuration instruction to the first storage device, wherein the configuration instruction comprises identification information of the bloom filter to be updated and a hash value of the data ID corresponding to the bloom filter, so that the first storage device configures the corresponding bloom filter according to the configuration instruction.
8. The method of claim 7, wherein the first storage device is a Redis cluster and the configuration instruction is a bit manipulation instruction.
9. The method according to any one of claims 1-8, further comprising:
acquiring a data ID corresponding to at least one cached data;
determining a hash value corresponding to each data ID according to a preset rule;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system.
10. The method of claim 9, wherein for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter configured by other computing nodes in the distributed system comprises:
determining a bloom filter corresponding to at least one history updating period;
and for each cached data, determining whether the data needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one historical updating period.
11. The method of claim 10, wherein each update period corresponds to a plurality of bloom filters;
determining a bloom filter corresponding to at least one history update period, comprising:
for each history updating period, determining a plurality of bloom filters corresponding to the history updating period according to the history updating period and the number of the bloom filters in each updating period;
correspondingly, determining whether the data corresponding to the data ID needs to be updated according to the hash value corresponding to the data ID and the bloom filter corresponding to the at least one history updating period includes:
for each history updating period, searching a bloom filter corresponding to the data ID from bloom filters corresponding to the data ID in the history updating period according to the hash value corresponding to the data ID;
and determining whether the data corresponding to the data ID needs to be updated according to the searched bloom filter.
12. The method as claimed in claim 11, wherein searching for the bloom filter corresponding to the data ID from the corresponding bloom filters in the history update period according to the hash value corresponding to the data ID comprises:
determining corresponding period information according to the historical updating period;
performing modular operation on the number of bloom filters corresponding to one updating period by using the hash value corresponding to the data ID to obtain corresponding sequence information;
and splicing the periodic information and the sequence information to obtain the identification information of the bloom filter corresponding to the data ID.
13. The method of claim 12, wherein determining corresponding period information based on historical update periods comprises:
performing integer division operation on the duration of one updating period by the current time;
subtracting the offset from the result of the integer division operation to obtain corresponding period information;
wherein the offset is determined by a number of cycles of an interval between the history update period and a current update period.
14. The method of claim 10, wherein the at least one historical update period is a last update period, and wherein the first computing node determines whether there is data to update according to a bloom filter every other update period.
15. The method according to any one of claims 1-8, further comprising:
deleting the bloom filter before a preset time every other preset cleaning period; and/or the presence of a gas in the gas,
after the bloom filter is configured, sending an expiration time corresponding to the bloom filter to a first storage device storing the bloom filter, so that the first storage device automatically deletes the bloom filter according to the expiration time.
16. An apparatus for updating data, applied to a first computing node in a distributed system, the apparatus comprising:
the determining module is used for determining a data ID corresponding to the updated at least one data;
the calculation module is used for determining hash values corresponding to the data IDs according to preset rules;
and the configuration module is used for configuring the bloom filter according to the hash value corresponding to each data ID, so that the second computing node in the distributed system determines the data needing to be updated according to the bloom filter.
17. An electronic device, comprising: a memory and at least one processor;
the memory stores computer-executable instructions;
the at least one processor executing the computer-executable instructions stored by the memory causes the at least one processor to perform the data update method of any of claims 1-15.
18. A computer-readable storage medium having computer-executable instructions stored thereon, which when executed by a processor, implement the data update method of any one of claims 1-15.
19. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1-15 when executed by a processor.
CN202011474674.7A 2020-12-14 2020-12-14 Data updating method, device, equipment, storage medium and program product Pending CN112487009A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011474674.7A CN112487009A (en) 2020-12-14 2020-12-14 Data updating method, device, equipment, storage medium and program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011474674.7A CN112487009A (en) 2020-12-14 2020-12-14 Data updating method, device, equipment, storage medium and program product

Publications (1)

Publication Number Publication Date
CN112487009A true CN112487009A (en) 2021-03-12

Family

ID=74917105

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011474674.7A Pending CN112487009A (en) 2020-12-14 2020-12-14 Data updating method, device, equipment, storage medium and program product

Country Status (1)

Country Link
CN (1) CN112487009A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486025A (en) * 2021-07-28 2021-10-08 北京腾云天下科技有限公司 Data storage method, data query method and device
CN113704255A (en) * 2021-08-04 2021-11-26 深圳市蜜蜂互联网络科技有限公司 Data insertion method and device, and data verification method and device
CN114138808A (en) * 2021-12-07 2022-03-04 中国建设银行股份有限公司 Data updating method and device, electronic equipment and readable storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345472A (en) * 2013-06-04 2013-10-09 北京航空航天大学 Redundancy removal file system based on limited binary tree bloom filter and construction method of redundancy removal file system
US20150220625A1 (en) * 2014-02-03 2015-08-06 Interdigital Patent Holdings, Inc. Methods and apparatus for conveying surveillance targets using bloom filters
CN105554122A (en) * 2015-12-18 2016-05-04 畅捷通信息技术股份有限公司 Information updating method, information updating device, terminal and server
CN107391770A (en) * 2017-09-13 2017-11-24 北京锐安科技有限公司 A kind of method, apparatus of processing data, equipment and storage medium
US20190370374A1 (en) * 2018-05-30 2019-12-05 Joshua Daniel Carter Bloom filter series
US10666427B1 (en) * 2019-06-11 2020-05-26 Integrity Security Services Llc Device update transmission using a bloom filter

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345472A (en) * 2013-06-04 2013-10-09 北京航空航天大学 Redundancy removal file system based on limited binary tree bloom filter and construction method of redundancy removal file system
US20150220625A1 (en) * 2014-02-03 2015-08-06 Interdigital Patent Holdings, Inc. Methods and apparatus for conveying surveillance targets using bloom filters
CN105554122A (en) * 2015-12-18 2016-05-04 畅捷通信息技术股份有限公司 Information updating method, information updating device, terminal and server
CN107391770A (en) * 2017-09-13 2017-11-24 北京锐安科技有限公司 A kind of method, apparatus of processing data, equipment and storage medium
US20190370374A1 (en) * 2018-05-30 2019-12-05 Joshua Daniel Carter Bloom filter series
US10666427B1 (en) * 2019-06-11 2020-05-26 Integrity Security Services Llc Device update transmission using a bloom filter

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王乾: "可加速最长前缀匹配的布隆过滤查找方案", 通信技术, vol. 53, no. 7, 31 July 2020 (2020-07-31) *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486025A (en) * 2021-07-28 2021-10-08 北京腾云天下科技有限公司 Data storage method, data query method and device
CN113486025B (en) * 2021-07-28 2023-07-25 北京腾云天下科技有限公司 Data storage method, data query method and device
CN113704255A (en) * 2021-08-04 2021-11-26 深圳市蜜蜂互联网络科技有限公司 Data insertion method and device, and data verification method and device
CN114138808A (en) * 2021-12-07 2022-03-04 中国建设银行股份有限公司 Data updating method and device, electronic equipment and readable storage medium

Similar Documents

Publication Publication Date Title
CN112487009A (en) Data updating method, device, equipment, storage medium and program product
KR102376713B1 (en) Composite partition functions
KR101871383B1 (en) Method and system for using a recursive event listener on a node in hierarchical data structure
CN110798332A (en) Method and system for searching directory access group
CN111597403B (en) Method and device for constructing graph index, electronic equipment and storage medium
CN117130998A (en) Log information processing method, device, equipment and storage medium
CN112507676B (en) Method and device for generating energy report, electronic equipment and computer readable medium
CN115391605A (en) Data query method, device, equipment, computer readable medium and program product
CN114490718A (en) Data output method, data output device, electronic equipment and computer readable medium
CN110941683B (en) Method, device, medium and electronic equipment for acquiring object attribute information in space
CN114253730A (en) Method, device and equipment for managing database memory and storage medium
CN113971192A (en) Data processing method and device, readable medium and electronic equipment
CN113064704A (en) Task processing method and device, electronic equipment and computer readable medium
CN113760927A (en) Data processing method and device, electronic equipment and computer readable medium
CN115993942B (en) Data caching method, device, electronic equipment and computer readable medium
CN111580890A (en) Method, apparatus, electronic device, and computer-readable medium for processing features
CN115348260B (en) Information processing method, device, equipment and medium based on campus information security
CN112667607B (en) Historical data management method and related equipment
CN112035529B (en) Caching method, caching device, electronic equipment and computer readable storage medium
CN111209042B (en) Method, device, medium and electronic equipment for establishing function stack
CN110716885B (en) Data management method and device, electronic equipment and storage medium
US20230021513A1 (en) System and method for a content-aware and context-aware compression algorithm selection model for a file system
CN112035529A (en) Caching method and device, electronic equipment and computer readable storage medium
CN115391358A (en) Array updating method and device, electronic equipment and computer readable medium
CN117520399A (en) Data storage method, apparatus, electronic device, and computer readable medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination