CN109947775B - Data processing method and device, electronic equipment and computer readable medium - Google Patents

Data processing method and device, electronic equipment and computer readable medium Download PDF

Info

Publication number
CN109947775B
CN109947775B CN201910189061.XA CN201910189061A CN109947775B CN 109947775 B CN109947775 B CN 109947775B CN 201910189061 A CN201910189061 A CN 201910189061A CN 109947775 B CN109947775 B CN 109947775B
Authority
CN
China
Prior art keywords
value
record
bit
target
storage structure
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910189061.XA
Other languages
Chinese (zh)
Other versions
CN109947775A (en
Inventor
胡俊飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing ThreatBook Technology Co Ltd
Original Assignee
Beijing ThreatBook Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing ThreatBook Technology Co Ltd filed Critical Beijing ThreatBook Technology Co Ltd
Priority to CN201910189061.XA priority Critical patent/CN109947775B/en
Publication of CN109947775A publication Critical patent/CN109947775A/en
Application granted granted Critical
Publication of CN109947775B publication Critical patent/CN109947775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The embodiment of the application discloses a data processing method, a data processing device, electronic equipment and a computer readable medium. The method may be used for a log structured merge tree (LSM) storage structure, comprising: in response to detecting that a merge operation is performed on target data in the LSM tree storage structure, obtaining a bit operation record related to the target data, wherein the target data comprises an initial operated value and an operation value; judging whether the bit operation records contain non-target operation records or not; in response to the fact that the record of the non-target operation is not contained in the bit operation record, according to a preset algorithm, carrying out bit operation indicated by the bit operation record on the initial operated value and the operation value, and generating a merged record; and storing the generated combined record into an LSM tree storage structure. By merging the bit operations indicated by the bit operation records of the target data, the number of entries of the data storage in the LSM tree storage structure can be reduced, thereby contributing to reducing the occupation of the storage space.

Description

Data processing method and device, electronic equipment and computer readable medium
Technical Field
The embodiment of the application relates to the technical field of computers, in particular to a data processing method, a data processing device, electronic equipment and a computer readable medium.
Background
In the existing field of nosql (not Only sql) databases, a storage engine and a database based on an LSM (Log Structured Merge Tree) structure occupy very important positions. The LSM tree structure is a hard disk based data structure. The LSM tree storage structure may Write updates of the database into a memory and a WAL (Write Ahead Log) Log each time, thereby improving the Write performance of the database. Meanwhile, the number and size of data files can be compressed by structuring each node file of the merged tree, and the reading performance is improved as much as possible. That is, an LSM tree storage structure is typically one that trades off partial read performance for better write performance.
The function of the currently common LSM tree storage structure is usually simpler. Generally only including functions of writing, deleting, querying and the like for a single value. While part of the database may also contain query functions for range scans.
Disclosure of Invention
The embodiment of the application provides a data processing method, a data processing device, electronic equipment and a computer readable medium.
In a first aspect, an embodiment of the present application provides a data processing method, which is used for a log structure merge tree LSM storage structure, and includes: in response to detecting that a merge operation is performed on target data in the LSM tree storage structure, obtaining a bit operation record related to the target data, wherein the target data comprises an initial operated value and an operation value; judging whether the bit operation records contain non-target operation records or not; in response to the fact that the record of the non-target operation is not contained in the bit operation record, according to a preset algorithm, carrying out bit operation indicated by the bit operation record on the initial operated value and the operation value, and generating a merged record; and storing the generated combined record into an LSM tree storage structure.
In some embodiments, the non-target operation includes a bit operation in addition to an OR operation and/or an AND operation; and the bit operation record comprises operators and operation values of the bit operation.
In some embodiments, the non-target operation includes a bit operation other than an OR operation and an AND operation; and the operators in the bit operation record comprise or operators and/or AND operators, wherein the or operators are used for characterizing or operating, and the AND operators are used for characterizing and operating.
In some embodiments, the predetermined algorithm comprises at least one of: the AND operation result of the initial operated value and the first operation value is ANDed with the second operation value, and the AND operation result equal to the first operation value and the second operation value is ANDed with the initial operated value; or the OR operation result of the initial operated value and the first operation value is then OR-operated with the second operation value, and the OR operation result equal to the OR operation result of the first operation value and the second operation value is then OR-operated with the initial operated value; or the AND operation result of the initial operated value and the first operation value is equal to the OR operation result of the initial operated value and the second operation value and the OR operation result of the first operation value and the second operation value; or the AND operation result of the initial operated value and the first operation value and the second operation value is equal to the AND operation result of the initial operated value and the second operation value and the AND operation result of the first operation value and the second operation value.
In some embodiments, the data in the LSM tree storage structure is stored in the form of key-value pairs, with the target data being the value corresponding to the target key.
In some embodiments, storing the generated merged record into the LSM tree storage structure includes: and establishing a corresponding relation between the initial operated value in the target data and the correspondingly generated merging record, and storing the merging record into an LSM tree storage structure.
In a second aspect, an embodiment of the present application provides a data processing apparatus, configured to apply to a log-structured merge tree LSM storage structure, including: an obtaining unit configured to obtain a bit operation record related to target data in the LSM tree storage structure in response to detecting a merge operation performed on the target data, wherein the target data includes an initial operated-on value and an operation value; a judging unit configured to judge whether a record of a non-target operation is included in the bit operation record; the merging unit is configured to respond to the record which does not contain the non-target operation in the bit operation record, carry out bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm, and generate a merging record; a storage unit configured to store the generated merged record into an LSM tree storage structure.
In some embodiments, the non-target operation includes a bit operation in addition to an OR operation and/or an AND operation; and the bit operation record comprises operators and operation values of the bit operation.
In some embodiments, the non-target operation includes a bit operation other than an OR operation and an AND operation; and the operators in the bit operation record comprise or operators and/or AND operators, wherein the or operators are used for characterizing or operating, and the AND operators are used for characterizing and operating.
In some embodiments, the predetermined algorithm comprises at least one of: the AND operation result of the initial operated value and the first operation value is ANDed with the second operation value, and the AND operation result equal to the first operation value and the second operation value is ANDed with the initial operated value; or the OR operation result of the initial operated value and the first operation value is then OR-operated with the second operation value, and the OR operation result equal to the OR operation result of the first operation value and the second operation value is then OR-operated with the initial operated value; or the AND operation result of the initial operated value and the first operation value is equal to the OR operation result of the initial operated value and the second operation value and the OR operation result of the first operation value and the second operation value; or the AND operation result of the initial operated value and the first operation value and the second operation value is equal to the AND operation result of the initial operated value and the second operation value and the AND operation result of the first operation value and the second operation value.
In some embodiments, the data in the LSM tree storage structure is stored in the form of key-value pairs, with the target data being the value corresponding to the target key.
In some embodiments, the storage unit is further configured to: and establishing a corresponding relation between the initial operated value in the target data and the correspondingly generated merging record, and storing the merging record into an LSM tree storage structure.
In a third aspect, an embodiment of the present application provides an electronic device, including: a processor; a storage device having a computer program stored thereon; the processor, when executing the computer program on the storage means, causes the electronic device to carry out the data processing method as described in any of the embodiments of the first aspect above.
In a fourth aspect, the present application provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the data processing method described in any one of the embodiments in the first aspect.
According to the data processing method, the data processing device, the electronic device and the computer readable medium, when the merging operation of the target data in the LSM tree storage structure is detected, the bit operation record related to the target data can be acquired. Wherein the target data may include an initial manipulated value and an operation value. And in the case that the record of the non-target operation is determined not to be included in the bit operation record, performing the bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm, thereby generating the merged record. And the generated merged record may be stored in the LSM tree storage structure. In this way, the number of entries of the data storage in the LSM tree storage structure may be reduced, thereby helping to reduce the footprint of the storage space.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present application may be applied;
FIG. 2 is a flow diagram of one embodiment of a data processing method provided herein;
FIG. 3 is a flow chart of yet another embodiment of a data processing method provided herein;
FIG. 4 is a flow diagram of one embodiment of a merge operation provided herein;
fig. 5 is a schematic structural diagram of an embodiment of a data processing apparatus provided in the present application.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 shows an exemplary system architecture 100 to which the data processing method or data processing apparatus of the embodiments of the present application may be applied.
As shown in fig. 1, system architecture 100 may include clients 101, 102, a network 103, and a server 104. Network 103 may be the medium used to provide communications links between clients 101, 102 and server 104. Network 103 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
Users may use clients 101, 102 to interact with server 104 over network 103 to receive or send messages, etc. For example, a user may send a data update request to the server 104 via the client 101, 102. The data update request may be used to characterize at least one of writing data to the LSM tree storage structure, deleting or modifying existing data therein, and so on. The clients 101, 102 may have various client applications installed thereon, such as a database management application, a browser, a shopping application, an instant messenger, and the like.
Here, the clients 101 and 102 may be hardware or software. When the clients 101, 102 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, desktop computers, and the like. When the clients 101, 102 are software, they can be installed in the electronic devices listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
The server 104 may be a server that provides various services. For example, it may be a background server that provides support for applications installed by the clients 101, 102. For another example, a database server or the like having an LSM tree storage structure may be used. The database server may record and store the data update requests sent by the clients 101, 102, thereby generating a bit operation record. Therefore, when the data needs to be merged, the bit operation records related to the data can be obtained, and then the records are analyzed and processed. And the results of the analysis process (e.g., the generated consolidated record) may be stored in a local database. Further, the analysis processing result may also be sent to the client 101, 102 to present the analysis processing result to the user.
Here, the server 104 may be hardware or software. When the server 104 is hardware, it may be implemented as a distributed server cluster composed of multiple servers, or may be implemented as a single server. When the server 104 is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the data processing method provided by the embodiment of the present application is generally executed by the server 104. Accordingly, data processing devices are also typically provided in the server 104.
It should be understood that the number of clients, networks, and servers in FIG. 1 is merely illustrative. There may be any number of clients, networks, and servers, as desired for an implementation.
Referring to fig. 2, a flow 200 of an embodiment of a data processing method provided by the present application is shown. The data processing method can be used for a log-structured merged tree (LSM) storage structure and comprises the following steps:
in response to detecting a merge operation performed on target data in the LSM tree storage structure, a bit operation record associated with the target data is obtained, step 201.
In this embodiment, an executing entity (e.g., the server 104 shown in fig. 1) of the data processing method may obtain a bit operation record associated with target data when detecting that a merge operation is performed on the target data in an LSM tree storage structure (e.g., the storage engine and/or the database hbase, the news db, etc.). Where the target data may be any data in the LSM tree storage structure that may be modified. For example, the target data may be data on a specified leaf node in the LSM tree. The target data here may include an initial operated-on value and an operation value. The bit operation here is generally a byte bit-based operation performed on specified data (value). Such as an or operation (which may be represented by the "|" symbol), an and operation (which may be represented by the "&" symbol), and so forth.
In this embodiment, the initial manipulated value may be an original value in general. And the operand value may be a numerical value that performs a bit-manipulation operation on the initially manipulated value. For example, the initial manipulated value is X and the manipulated value is V. Then the result of anding X may be: and X & V.
Here, to better explain the principle of the LSM tree storage structure, assume now, by way of example, a random number of 1000 nodes. For a disk, it is usually fastest to write the 1000 nodes to the disk sequentially. But since the data is completely out of order in the disk, each read requires a full scan. Therefore, in order to keep the read performance as high as possible, the data must be ordered in the disk, which is the principle of the B + tree (a tree data structure that includes root nodes, internal nodes, and leaf nodes) storage structure. However, a large amount of random IO (input/output) is generated during the writing process, which results in a poor disk seek speed.
It can be seen that to overcome the weakness of the B + tree storage structure, an LSM tree storage structure was introduced. This memory architecture is essentially balanced between reading and writing. Compared with a B + tree storage structure, the memory structure sacrifices partial reading performance and is used for greatly improving writing performance. The principle is to split a big tree into N small trees. The method firstly writes the data into a memory (the memory has no problem of seeking speed, and the performance of random writing is greatly improved), and an ordered treelet is constructed in the memory. As the treelets become larger, the treelets in memory will flush (i.e., forcibly write out the contents of the memory and empty the memory data) to the disk. Here, the treelet in the memory is the MemStore, and after each flush, the MemStore in the memory can become a new StoreFile on the disk. When reading, all treelets must be traversed since it is not known on which treelet the data is. But the data is usually ordered inside each treelet.
It should be noted that, in the LSM tree storage structure, since data is generally written into the memory first, if power is off, the data in the memory is often lost. Therefore, in order to protect data in memory, WAL logs are typically used. I.e. a log file (Logfile) needs to be recorded first on the disk. And when the data flush in the memory is written to the disk, the corresponding LogFile can be discarded.
In this embodiment, when the treelet in the memory of the execution subject needs to flush to the disk, it is equivalent to the detection of the merge operation performed on the target data in the LSM tree storage structure. The target data at this time may be data indicated by a treelet that requires flush to disk. And the bit operation record associated with the target data may be a record describing a series of bit operations performed on the target data.
It can be understood that the condition for flushing the treelet in the memory onto the disk, that is, the condition for starting the merge operation, is not limited in the present application. For example, the number of nodes of the treelets in the memory reaches a preset value (e.g., 15), or the storage space occupied by the treelets in the memory reaches a preset threshold, and the like.
Step 202, determine whether the bit operation record contains a record of a non-target operation.
In the present embodiment, for the bit operation record acquired in step 201, the execution body may determine whether a record of a non-target operation is included therein. For example, the execution subject may match a non-target operation (identification) in the bit operation (identification) indicated by the bit operation record. If the matching is unsuccessful, it can be stated that the bit operation record does not include the record of the non-target operation. Otherwise, it can be stated that the bit operation record includes the non-target operation. Here, if the execution subject determination bit operation record does not include a record of a non-target operation, the execution subject determination bit operation record may continue to perform step 203.
It should be noted that the non-target operation is not limited in this application, and may be set according to actual situations. Optionally, the non-target operations described above may include bit operations in addition to or operations and/or and operations. That is, the non-target operation may be a bit operation that does not include an OR operation and/or an AND operation.
In some embodiments, operators and operation values of bit operations may be included in the bit operation records. Operators may be used to characterize the type of bit operation, among other things. In this way, the execution main body can determine whether or not a record of a non-target operation is included in the bit operation record by determining whether or not an operator matching an operator of the non-target operation exists among operators included in the bit operation record.
Step 203, in response to determining that the bit operation record does not include a record of a non-target operation, performing a bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm to generate a merged record.
In this embodiment, in the case where it is determined that the record of the non-target operation is not included in the bit operation record, the execution body may perform the bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm, thereby generating the merged record. Wherein a preset algorithm may be used for merging bit operation records to reduce the number of entries of the generated merged record. I.e. to reduce the number of bit operations contained in the merged record. The preset algorithm can be set according to different bit operations.
And step 204, storing the generated combined record into an LSM tree storage structure.
In this embodiment, the execution principal may store the merged record generated in step 203 into the LSM tree storage structure. The merged record may be stored, for example, to a corresponding local disk.
It should be noted that, in the conventional LSM tree storage structure, as the number of treelets increases, the performance of reading may be worse. Therefore, it is necessary to merge (merge) treelets in the disk at an appropriate time. So that a plurality of small trees can be changed into a big tree.
In contrast, the present embodiment provides a data processing method, when a merge operation performed on target data in the LSM tree storage structure is detected, a bit operation record related to the target data can be obtained. And in the case that the record of the non-target operation is determined not to be included in the bit operation record, performing the bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm, thereby generating the merged record. And the generated merged record may be stored in the LSM tree storage structure. That is, before merging the treelets in the disk by the conventional method, the data processing method in this embodiment may merge the relevant bit operation records when the treelets in the memory are flushed to the disk. In this way, the number of entries of the data storage in the LSM tree storage structure may be reduced, thereby helping to reduce the footprint of the storage space. In addition, the reading performance of the storage structure is also improved.
With continued reference to fig. 3, a flow 300 of yet another embodiment of a data processing method provided herein is shown. In this embodiment, the data in the LSM tree storage structure may be stored in a key-value pair (key-value) form, and the data processing method may include the following steps:
in response to detecting a merge operation performed on target data in the LSM tree storage structure, a bit operation record associated with the target data is obtained, step 301.
In this embodiment, an executing body of the data processing method (e.g., the server 104 shown in fig. 1) may acquire a bit operation record related to target data when detecting that a merge operation is performed on the target data in the LSM tree storage structure. Wherein the target data may be a value corresponding to the target key. Which may include an initial manipulated value and an operation value. The bit operation record may be a record for describing a series of bit operations performed on a value on a target key, and may include operators of the bit operations and operation values. Reference may be made to the related description in step 201 of the embodiment in fig. 2, and details are not repeated here.
The target may be any key in each key-value pair in the LSM tree storage structure. It will be appreciated that when updating data in the LSM tree storage structure, it is typically the value on the key that is updated, and the corresponding key is often unchanged. I.e. the same key corresponds to a different value.
Step 302, determine whether the bit operation record contains a record of a non-target operation.
In the present embodiment, for the bit operation record acquired in step 301, the execution body may determine whether a record of a non-target operation is included therein. Wherein the non-target operation may include a bit operation in addition to an or operation and an and operation. I.e., the non-target operation is a bit operation that does not include an OR operation and an AND operation. That is, the execution body may determine whether there is an operator that matches other operators (other than or operator, operator other than operator) among the operators included in the bit operation record. Wherein or operators may be used for characterization or operation. And operators may be used to characterize and operate.
Here, if there is no operator matching with other operators in the operators included in the bit operation record, it may be said that the bit operation record does not include a record of a non-target operation. The bit operation record may contain an operator and/or an operator. At this point, the executing agent may proceed to step 303.
Step 303, in response to determining that the bit operation record does not include a record of a non-target operation, performing a bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm to generate a merged record.
In this embodiment, in the case where it is determined that the record of the non-target operation is not included in the bit operation record, the execution body may perform a bit operation shown in the bit operation record on the initial operated value and the operation value according to a preset algorithm, thereby generating the merged record.
It should be noted that, since the bit operation record may include an and operator and/or an and operator, when the bit operations indicated in the bit operation record are all and operations, the preset algorithm herein may include: and the and operation result of the initial operated value and the first operation value is and-operated with the second operation value, and the and operation result equal to the first operation value and the second operation value is and-operated with the initial operated value. Namely X & a & B & (a & B).
If all bit operations indicated in the bit operation record are or operations, the preset algorithm herein may include: the OR operation result of the initial operated value and the first operation value is then OR-operated with the second operation value, and the OR operation result equal to the first operation value and the second operation value is then OR-operated with the initial operated value. I.e. X | -a | -B | (a | -B). The two preset algorithms can be used as a combination law algorithm of bit operations.
In addition, if the bit operation indicated in the bit operation record includes an and operation and an or operation, the preset algorithm herein may include: and the and operation result of the initial operated value and the first operation value is then or-operated with the second operation value, which is equal to or-operated with the or operation result of the initial operated value and the second operation value, and the or operation result of the first operation value and the second operation value. I.e. X & a | -B & (a | -B). Or, the or operation result of the initial operated value and the first operation value and the and operation result of the second operation value are equal to the or operation result of the initial operated value and the second operation value and the and operation result of the first operation value and the second operation value. I.e. X |, a & B | (a & B). These two preset algorithms may be referred to as commutative algorithms for bit manipulation.
It can be understood that, when the number of the target data is greater than three, that is, the number of the bit operation items in the bit operation record is greater than two, the execution main body may perform the merging operation on the first three bit operations according to the sequence of the bit operations and according to the preset algorithm. The flow of the specific merge operation can be seen in fig. 4.
First, in step 401, the execution body may check whether the remaining bit operation records exceed two (i.e., at least three). If the remaining bit operation record exceeds two, the execution body may execute step 402, i.e. obtain the first three bit operation records in the remaining bit operation record. Following step 403, in the first three bit operation records obtained, the execution body may determine whether the operators of the first two bit operation records are the same. If so, the executive body may execute step 404a to merge the first two bit operation records using the binding law algorithm described above. If not, the executing entity may execute step 404b, that is, in the obtained first three bit operation records, determine whether the operators in the last two bit operation records are the same. If so, the execution body may execute step 405a, that is, merge the two bit operation records using the above-mentioned association rule algorithm. If the operation records are different, the first one of the first three bit operation records is the same as the operation record of the third bit operation record. At this point, the executing entity may execute step 405b to swap the last two bit operation records using the swap law algorithm described above. Then, in step 406, the executing entity may apply a join rule algorithm to merge the first bit operation record with the second bit operation record after the swap. To this end, the three-bit operation record may be merged into a two-bit operation record. The execution principal may then return to continue with step 401. Until the remaining bit operation records are no more than two, the execution entity may perform step 407, i.e., end the merge operation.
As can be seen, the merge operation can be performed in sequence according to the above flow until no more merges can be performed. For example, X & a & B & C ═ (X & a & B) & C ═ (X & (a & B)) & C. Since the bit operation records include an OR operation and/or an AND operation, the bit operation records can be merged into a merged record of no more than two records at most.
And step 304, establishing a corresponding relation between the initial operated value in the target data and the correspondingly generated merging record, and storing the merging record into an LSM tree storage structure.
In this embodiment, the executing entity may generate the merge record in step 303, and establish a corresponding relationship with the initial operated value in the target data, for example, using the same target key or label. And the execution subject may store the merged record for which the correspondence has been established in the LSM tree storage structure. The merged record may be stored at the node where the target key (on the treelet) is located, for example, on other treelets in disk that have the same target key.
It should be noted that, in the present embodiment, all operations on a designated single key (e.g., the target key) are ordered and are not missing. While not addressing the multi-version problem of LSM tree storage structures. Also, typically, the initial manipulated values and the updated data will be stored in different files. In addition, the treelets in the disk may be stored in different files each time the flush is done. In a conventional LSM tree storage structure, cross-file merge operations cannot be performed. In the embodiment, the purpose of the data processing method is to merge all the continuous bit operation records without referring to the initial operated value. That is, with the data processing method in the present embodiment, a merged record corresponding to the initial manipulated value can be obtained.
In the data processing method in this embodiment, all the related bit operation (or and operation) records may be merged into a merged record without actually performing an operation on the initial operated value. At most, no more than two records are merged, and the record value size of each record can generally remain constant. Therefore, the number of entries of the storage records can be greatly reduced, the occupation of storage space is reduced, and the data processing efficiency of the LSM tree storage structure is further improved.
In addition, it can be known from practical applications that the data processing method in the above embodiments can satisfy the usage requirements in both time complexity and space complexity. I.e. the complexity is within the range that can be used normally.
Referring now to fig. 5, the present application further provides an embodiment of a data processing apparatus as an implementation of the method shown in the above figures. This device embodiment corresponds to the method embodiment shown in the various embodiments described above. The device can be applied to various electronic equipment.
As shown in fig. 5, the data processing apparatus 500 of the present embodiment may be used for a log structured merge tree LSM storage structure, including: an obtaining unit 501 configured to, in response to detecting that a merge operation is performed on target data in the LSM tree storage structure, obtain a bit operation record associated with the target data, where the target data includes an initial operated-on value and an operation value; a judging unit 502 configured to judge whether a record of a non-target operation is included in the bit operation record; a merging unit 503 configured to perform a bit operation indicated by the bit operation record on the initial operated value and the operation value according to a preset algorithm in response to determining that the record of the non-target operation is not included in the bit operation record, and generate a merged record; a storage unit 504 configured to store the generated merged record into the LSM tree storage structure.
In some embodiments, non-target operations may include bit operations in addition to or operations and/or and operations; and operators and operation values of bit operations may be included in the bit operation record.
Alternatively, the non-target operation may include a bit operation other than an OR operation and an AND operation; and the operators in the bit operation record can comprise or operators and/or AND operators, wherein the or operators can be used for characterization or operation, and the AND operations can be used for characterization and operation.
Further, the preset algorithm may include at least one of: the AND operation result of the initial operated value and the first operation value is ANDed with the second operation value, and the AND operation result equal to the first operation value and the second operation value is ANDed with the initial operated value; or the OR operation result of the initial operated value and the first operation value is then OR-operated with the second operation value, and the OR operation result equal to the OR operation result of the first operation value and the second operation value is then OR-operated with the initial operated value; or the AND operation result of the initial operated value and the first operation value is equal to the OR operation result of the initial operated value and the second operation value and the OR operation result of the first operation value and the second operation value; or the AND operation result of the initial operated value and the first operation value and the second operation value is equal to the AND operation result of the initial operated value and the second operation value and the AND operation result of the first operation value and the second operation value.
In some application scenarios, the data in the LSM tree storage structure may be stored in the form of key-value pairs, and the target data may be a value corresponding to a target key.
Optionally, the storage unit 504 may be further configured to: and establishing a corresponding relation between the initial operated value in the target data and the correspondingly generated merging record, and storing the merging record into an LSM tree storage structure.
It will be understood that the units described in the apparatus 500 correspond to the various steps in the method described with reference to fig. 2 to 3. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.
It is to be noted that the flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be located in the processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves. For example, the fetch unit may also be described as a "unit that fetches a bit operation record associated with the target data".
As another aspect, the present application also provides a computer-readable medium. The computer readable medium herein may be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer-readable medium may be included in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable medium carries a computer program which, when executed by a processor, enables the electronic device to implement the data processing method as described in any of the embodiments above.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (10)

1. A data processing method for a log-structured merge tree LSM storage structure, comprising:
in response to detecting a merge operation performed on target data in an LSM tree storage structure, obtaining a bit operation record associated with the target data, wherein the target data comprises an initial operated-on value and an operation value;
judging whether the bit operation records contain non-target operation records or not;
in response to the fact that the bit operation record does not contain a record of non-target operation, according to a preset algorithm, carrying out bit operation indicated by the bit operation record on an initial operated value and an operation value to generate a combined record;
and storing the generated combined record into an LSM tree storage structure.
2. The method of claim 1, wherein the non-target operation comprises a bit operation other than an OR operation and/or an AND operation; and
the bit operation record comprises operators and operation values of bit operations.
3. The method of claim 2, wherein the non-target operation comprises a bit operation other than an OR operation and an AND operation; and
the operators in the bit operation record comprise an OR operator and/or an AND operator, wherein the OR operator is used for characterizing or operating, and the AND operator is used for characterizing and operating.
4. The method of claim 3, wherein the predetermined algorithm comprises at least one of:
the AND operation result of the initial operated value and the first operation value is ANDed with the second operation value, and the AND operation result equal to the first operation value and the second operation value is ANDed with the initial operated value; or
The OR operation result of the initial operated value and the first operation value is then OR-operated with the second operation value, and the OR operation result equal to the first operation value and the second operation value is then OR-operated with the initial operated value; or
The AND operation result of the initial operated value and the first operation value is subjected to OR operation with the second operation value, and the OR operation result of the initial operated value and the second operation value and the OR operation result of the first operation value and the second operation value are subjected to OR operation; or
And the or operation result of the initial operated value and the first operation value is equal to or the and operation result of the initial operated value and the second operation value and the and operation result of the first operation value and the second operation value.
5. The method of any of claims 1-4, wherein the data in the LSM tree storage structure is stored in the form of key-value pairs, and wherein the target data is a value corresponding to a target key.
6. The method of claim 5, wherein storing the generated merged record in an LSM tree storage structure comprises:
and establishing a corresponding relation between the initial operated value in the target data and the correspondingly generated merging record, and storing the merging record into an LSM tree storage structure.
7. A data processing apparatus for a log structured merge tree, LSM, storage structure, comprising:
an obtaining unit configured to obtain a bit operation record related to target data in an LSM tree storage structure in response to detecting a merge operation performed on the target data, wherein the target data comprises an initial operated-on value and an operation value;
a judging unit configured to judge whether a record of a non-target operation is included in the bit operation record;
a merging unit configured to perform a bit operation indicated by the bit operation record on an initial operated value and an operation value according to a preset algorithm in response to determining that the bit operation record does not include a record of a non-target operation, and generate a merged record;
a storage unit configured to store the generated merged record into an LSM tree storage structure.
8. The apparatus of claim 7, wherein the non-target operation comprises a bit operation other than an OR operation and an AND operation; and
the operators in the bit operation record comprise an OR operator and/or an AND operator, wherein the OR operator is used for characterizing or operating, and the AND operator is used for characterizing and operating.
9. An electronic device, comprising:
a processor;
a storage device having a computer program stored thereon;
the processor, when executing the computer program on the storage means, causes the electronic device to carry out the data processing method according to one of claims 1 to 6.
10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the data processing method of one of claims 1 to 6.
CN201910189061.XA 2019-03-13 2019-03-13 Data processing method and device, electronic equipment and computer readable medium Active CN109947775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910189061.XA CN109947775B (en) 2019-03-13 2019-03-13 Data processing method and device, electronic equipment and computer readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910189061.XA CN109947775B (en) 2019-03-13 2019-03-13 Data processing method and device, electronic equipment and computer readable medium

Publications (2)

Publication Number Publication Date
CN109947775A CN109947775A (en) 2019-06-28
CN109947775B true CN109947775B (en) 2021-03-23

Family

ID=67009681

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910189061.XA Active CN109947775B (en) 2019-03-13 2019-03-13 Data processing method and device, electronic equipment and computer readable medium

Country Status (1)

Country Link
CN (1) CN109947775B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107735774A (en) * 2015-05-05 2018-02-23 华为技术有限公司 A kind of SMR perceives only additional file system

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104424219B (en) * 2013-08-23 2018-10-09 华为技术有限公司 A kind of management method and device of data file
CN105653539A (en) * 2014-11-13 2016-06-08 腾讯数码(深圳)有限公司 Index distributed storage implement method and device
CN105468298B (en) * 2015-11-19 2018-11-13 中国科学院信息工程研究所 A kind of key assignments storage method based on log-structured merging tree
US10678654B2 (en) * 2016-10-26 2020-06-09 Acronis International Gmbh Systems and methods for data backup using data binning and deduplication
US20180349095A1 (en) * 2017-06-06 2018-12-06 ScaleFlux, Inc. Log-structured merge tree based data storage architecture
CN109213432B (en) * 2017-07-04 2021-10-26 华为技术有限公司 Storage device for writing data using log structured merge tree and method thereof
CN108052643B (en) * 2017-12-22 2021-02-23 北京奇虎科技有限公司 Data storage method and device based on LSM Tree structure and storage engine
US20190034427A1 (en) * 2017-12-28 2019-01-31 Intel Corporation Data management system employing a hash-based and tree-based key-value data structure

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107735774A (en) * 2015-05-05 2018-02-23 华为技术有限公司 A kind of SMR perceives only additional file system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
《基于LSM Tree的分布式索引实现》;隆飞等;《华东师范大学学报(自然科学版)》;20161205;第36-44,66页 *
《基于LSM的网络日志信息提取的实现》;张禹等;《福建电脑》;20060630;第141-142+148页 *

Also Published As

Publication number Publication date
CN109947775A (en) 2019-06-28

Similar Documents

Publication Publication Date Title
US9135289B2 (en) Matching transactions in multi-level records
US7822710B1 (en) System and method for data collection
CN111190928A (en) Cache processing method and device, computer equipment and storage medium
US10678779B2 (en) Generating sub-indexes from an index to compress the index
US20160179919A1 (en) Asynchronous data replication using an external buffer table
CN107704202B (en) Method and device for quickly reading and writing data
CN111414389B (en) Data processing method and device, electronic equipment and storage medium
CN106909597B (en) Database migration method and device
US11200231B2 (en) Remote query optimization in multi data sources
CN111949710A (en) Data storage method, device, server and storage medium
US11924304B2 (en) Accessing cached data using hash keys
CN112948409A (en) Data processing method and device, electronic equipment and storage medium
US9910873B2 (en) Efficient sorting of large data set with duplicate values
WO2020192663A1 (en) Data management method and related device
CN110222046B (en) List data processing method, device, server and storage medium
US20130204839A1 (en) Validating Files Using a Sliding Window to Access and Correlate Records in an Arbitrarily Large Dataset
CN109947775B (en) Data processing method and device, electronic equipment and computer readable medium
CN111046106A (en) Cache data synchronization method, device, equipment and medium
CN112506651B (en) Method and equipment for data operation in large-data-volume environment
CN111988195B (en) Response scheme determination method, device, equipment and medium for packet test
CN110222105B (en) Data summarization processing method and device
CN109857719B (en) Distributed file processing method, device, computer equipment and storage medium
CN111858609A (en) Fuzzy query method and device for block chain
CN113419792A (en) Event processing method and device, terminal equipment and storage medium
US20070220026A1 (en) Efficient caching for large scale distributed computations

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant