CN113779014A - Data storage method, device, equipment and storage medium - Google Patents

Data storage method, device, equipment and storage medium Download PDF

Info

Publication number
CN113779014A
CN113779014A CN202010522791.XA CN202010522791A CN113779014A CN 113779014 A CN113779014 A CN 113779014A CN 202010522791 A CN202010522791 A CN 202010522791A CN 113779014 A CN113779014 A CN 113779014A
Authority
CN
China
Prior art keywords
data
prefix
files
split
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010522791.XA
Other languages
Chinese (zh)
Inventor
李润辉
张文辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sangfor Technologies Co Ltd
Original Assignee
Sangfor Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sangfor Technologies Co Ltd filed Critical Sangfor Technologies Co Ltd
Priority to CN202010522791.XA priority Critical patent/CN113779014A/en
Publication of CN113779014A publication Critical patent/CN113779014A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating

Abstract

The embodiment of the invention provides a data storage method, a data storage device, electronic equipment and a computer storage medium, wherein the method comprises the following steps: splitting data files containing at least two different prefixes in a database according to prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, the data files are files sorted according to keys, and data processing is performed on the at least two split data files; the split data file only corresponds to one prefix, and no data file corresponding to at least two prefixes exists, so that when the data file is processed, the corresponding data file can be directly processed according to the prefix, and the expense in data processing is reduced.

Description

Data storage method, device, equipment and storage medium
Technical Field
The present invention relates to information technology, and in particular, to a data storage method and apparatus, an electronic device, and a computer storage medium.
Background
In the related art, in the field of data storage, when an upper layer service performs a fragmentation operation on data, data in a sequencing queue Table (SST) file of a rockdb database is stored in order according to key sequencing and key value sizes, which may cause data in a plurality of data fragments to exist in some SST files, and when the data fragmentation operation is performed on the SST files in which the plurality of data fragments exist, the data fragments that need to be processed cannot be directly processed.
Disclosure of Invention
The embodiment of the invention provides a data storage method, a data storage device, electronic equipment and a computer storage medium, which can solve the problem that data in a plurality of data pieces exist in certain SST files and the overhead is high during data piece operation.
The embodiment of the invention provides a data storage method, which comprises the following steps:
splitting data files containing at least two different prefixes in a database according to prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files sorted according to keys;
and carrying out data processing on the at least two split data files.
Optionally, splitting the data file according to the prefix, where the database at least includes at least two data files with different prefixes to obtain at least two split data files, includes:
and splitting the data file containing at least two different prefixes into the at least two split data files which are arranged in order according to the sequence of the at least two different prefixes.
Optionally, the performing data processing on the at least two split data files includes: and deleting data of at least one of the at least two split data files, and/or inserting data of the at least two split data files.
Optionally, the deleting data of at least one of the at least two split data files includes:
and when a data deleting instruction containing a first prefix is received and the first prefix contains the prefix of at least one data file in the at least two split data files, deleting the data file corresponding to the first prefix in the at least two split data files.
Optionally, the performing data insertion on the at least two split data files includes:
when a data insertion instruction containing a data file to be inserted is received and the data file to be inserted contains a second prefix, determining a data insertion position according to the second prefix, and inserting the data file to be inserted into the determined insertion position.
Optionally, the determining a data insertion location according to the second prefix includes:
and determining a data insertion position according to the second prefix and the prefix sequence of the at least two split data files.
Optionally, the data file is an SST file.
An embodiment of the present invention further provides a data storage apparatus, where the apparatus includes: a splitting module and a processing module, wherein,
the splitting module is used for splitting data files containing at least two different prefixes in the database according to the prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files which are sorted according to keys;
and the processing module is used for carrying out data processing on the at least two split data files.
Optionally, the splitting module is configured to split the data file containing the at least two different prefixes into the at least two split data files arranged in order according to the sequence of the at least two different prefixes.
Optionally, the processing module is configured to perform data deletion on at least one of the at least two split data files, and/or perform data insertion on the at least two split data files.
Optionally, the processing module is configured to delete a data file corresponding to a first prefix from the at least two split data files when a data deletion instruction including the first prefix is received and the first prefix includes a prefix of at least one data file of the at least two split data files.
Optionally, the processing module is configured to receive a data insertion instruction including a data file to be inserted, determine a data insertion position according to a second prefix when the data file to be inserted includes the second prefix, and insert the data file to be inserted into the determined insertion position.
Optionally, the processing module is configured to determine a data insertion location according to the second prefix and a prefix sequence of the at least two split data files.
Optionally, the data file is an SST file.
An embodiment of the present invention further provides an electronic device, including a processor and a memory for storing a computer program capable of running on the processor; wherein the content of the first and second substances,
the processor is configured to execute any one of the above data storage methods when the computer program is run.
An embodiment of the present invention further provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the computer program implements any one of the data storage methods described above.
According to the data storage method, the data storage device, the electronic equipment and the computer storage medium, the data files containing at least two different prefixes in the database are split according to the prefixes, so that at least two split data files are obtained, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files sorted according to keys; processing the data of the at least two split data files; the split data file only corresponds to one prefix, and does not have data files corresponding to at least two prefixes, so that when the data file is processed, the corresponding data file can be directly processed according to the prefix, and the expense in data processing is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and, together with the description, serve to explain the principles of the invention.
FIG. 1 is a diagram illustrating a splitting process of an SST file of RocksDB in the related art;
FIG. 2 is a schematic diagram illustrating a Log Structured Merge (LSM) hierarchical mechanism of a RocksDB in the related art;
fig. 3 is a schematic diagram illustrating a process of adding a prefix in RocksDB when data is managed in the related art;
FIG. 4 is a diagram illustrating a process of inserting data fragments into RocksDB in the related art;
FIG. 5 is a diagram illustrating a process of data slice insertion in the related art;
FIG. 6 is a flowchart illustrating a data storage method according to an embodiment of the present invention;
fig. 7 is a schematic diagram of a prefix splitting process according to an embodiment of the present invention;
FIG. 8 is a schematic diagram of a process for inserting data fragments according to an embodiment of the present invention;
FIG. 9 is a schematic diagram of a data storage device according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail below with reference to the accompanying drawings and examples. It should be understood that the examples provided herein are merely illustrative of the present invention and are not intended to limit the present invention. In addition, the following embodiments are provided as partial embodiments for implementing the present invention, not all embodiments for implementing the present invention, and the technical solutions described in the embodiments of the present invention may be implemented in any combination without conflict.
The term "and/or" herein is merely an association describing an associated object, meaning that three relationships may exist, e.g., a and/or B, may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
For example, the data storage method provided by the embodiment of the present invention includes a series of steps, but the data storage method provided by the embodiment of the present invention is not limited to the described steps, and similarly, the data storage device provided by the embodiment of the present invention includes a series of modules, but the device provided by the embodiment of the present invention is not limited to include the explicitly described modules, and may also include modules that are required to be configured to acquire relevant information or perform processing based on the information.
The embodiment of the invention can be applied to a Key Value store (KV store) engine, wherein the KV store can also be a Key-Value database and is a data storage paradigm designed for storing, retrieving and managing association arrays, and the association arrays can be a data structure called a dictionary or a hash table under many conditions; the key value storage engine may be a RocksDB storage engine or other storage engines, where the RocksDB is a database that is developed based on a levelDB and can provide a level db Application Programming Interface (API) that is backward compatible, and the RocksDB is an embedded KV store engine that can be written in C + +, where the key values of the RocksDB allow the use of binary streams; the level DB is an open-source persistent KV single-machine database and has high random writing and sequential reading/writing functions, and the random reading function of the level DB is not very strong, so the level DB is very suitable for being applied to scenes with less queries and much writing. Here, the LevelDB applies a file management policy of a Log Structured Merge tree (LSM-tree), which may be a data structure based on a hard disk, and can significantly reduce the overhead of a hard disk arm and provide high-speed insertion and/or deletion of files for a longer time, compared to a balanced multi-way lookup tree (B-tree). Since the LSM-tree is able to write files sequentially into a series of smaller files, each file contains a collection of data that changes in a short period of time, each file is ordered for somewhat faster retrieval before being written, and the files are unchanged, never updated with data, each time by writing a new file. The LSM-tree will examine all files for periodic merging to reduce the number of files for delayed and batch processing of index changes, and efficiently migrate updates to disk in a manner similar to merge sorting, reducing index insertion overhead. Meanwhile, RocksDB can be optimized for Flash storage, and delay is extremely small.
In the related art, RocksDB uses an SST file for persistent data storage, and the data stored by the SST file after being compressed (compact) is well-ordered data. The data are sorted according to keys and are only split according to the size, and when the size of the data exceeds the size limit of the SST file, the data are written into the next SST file, specifically, when the size of the key value corresponding to one key cannot be completely written into one SST file, the subsequent data corresponding to the key value can be written into the next SST file. Here, the keys in the SST file are data files organized in a sorted order, and one key or one iteration position can be located by binary search.
Fig. 1 is a schematic diagram of a splitting process of an SST file of RocksDB in the related art, and as shown in fig. 1, SST1 and SST2 represent two different SST files, and SST1 and SST2 are both data files with a size of 128M, where 111, 112, and 113 … … are keys in an SST1 file, and 223, 224, and 225 … … are keys in an SST2 file, and since sizes of key values corresponding to the keys 111 and 112 … … may be different, and a capacity of the SST1 is limited, a number of keys of the SST1 is changed according to sizes of key values corresponding to the keys, and obviously, when the key values stored in the SST1 occupy all storage spaces of the SST1, the key values are written into the SST 2.
Further, RocksDB uses a hierarchical mechanism in order to prevent a large number of files from being involved with each compression. The data of the upper layer is newly written data, the data of the lower layer is data with longer writing time, the same key may exist between the upper layer and the lower layer, and in this case, the key value corresponding to the key is based on the newly written data. The data of each layer is arranged continuously and orderly. When querying data, the query is started from the upper layer to the lower layer, and the first one found is the actual data value stored. The compression refers to a background task of merging some SST files into other SST files, and specifically, the compression of the SST files can delete the writing of the repeated data with the same key.
Fig. 2 is a schematic structural diagram of an LSM hierarchical mechanism of RocksDB in the related art, as shown in fig. 2, the size and capacity of each hierarchy are different, and sequentially increases, the capacity of level 1 is 300MB, the capacity of level 2 is 3GB, the capacity of level 3 is 30GB, and the capacity of level 4 is 300GB, and specifically, data may be written in such a manner that the data is first written into level 1, then determining if level 2 is full, if level 2 is not full, moving the data from level 1 to level 2, and further determining if level 3 is full, if level 3 is not full, the data is moved from level 2 to level 3, and finally it is determined if level 4 is full, if level 4 is not full, the data is moved from level 3 to level 4, thereby achieving that level 1 (the uppermost layer) is written with new data and level 4 (the lowermost layer) is written with the longest time.
Meanwhile, in order to manage data conveniently in the RocksDB, the upper layer service needs to fragment the data and add a corresponding prefix to a key stored in the data in the RocksDB in the bottom layer, so as to facilitate management and differentiation. Specifically, for the same RocksDB, different prefixes may be set for operations of the same key by different operators, for example, when the operator a and the operator B operate the key 111 separately, for convenience of management, the prefix a may be added before the key 111 for the operation of the operator a on the key 111, and the prefix B may be added before the key 111 for the operation of the operator B on the key 111, so as to distinguish management operations of data by different operators, although names of the prefixes are not specifically limited here, and may be characters "a", "B", or numbers "1", "2", and so on.
Fig. 3 is a schematic diagram illustrating a process of adding a prefix in RocksDB when data is managed in the related art, and the prefix is added in front of a key as shown in fig. 3.
In the related art of data storage, due to the way in which RocksDB stores data, in the bottom SST, data is stored in order according to the size of key values, which may result in that data of multiple data pieces exists in some SST files. When data fragmentation needs to be processed, for example, data fragmentation needs to be deleted, data in the SST files needs to be read and judged, and overhead is high.
Specifically, for the situation that the batch data deletion scene is the overall deletion of the data fragments, the data volume deleted at one time is large, the data fragments can be deleted in time and the space is released, and the RocksDB deletion modes include two modes: mode 1, issuing batch deletion marks, and then deleting marked data when compressing subsequent rocksDB; in the mode 2, the SST file storing the data is directly deleted according to the deleted data range. It can be seen that, in the method 1, data cannot be deleted in time, and a certain space is enlarged when the data is compressed, and resource overhead of a Central Processing Unit (CPU) is also increased, and the deletion speed of the method 2 is fast, but when a data range cannot include the data of a plurality of data fragments in a whole SST file, that is, some SST files include SST files with at least two different prefixes, the SST files with at least two different prefixes cannot be deleted when a data fragment, that is, an SST file with a certain prefix corresponding to the fragment is deleted, and data remains, for example, when a data fragment with a prefix a is deleted, an SST file including both prefixes a and B cannot be deleted.
Fig. 4 is a schematic diagram illustrating a process of inserting data fragments into RocksDB in the related art, as shown in fig. 4,
prefix-1 represents the data file with N prefixes being Prefix-1, Prefix-1 to Prefix-3 represent the data file with 1 Prefix between Prefix-1 to Prefix-3, and Prefix-3 represents the data file with N prefixes being Prefix-3. Where N may be an integer greater than or equal to 1. The Prefix-2 is used to indicate a to-be-inserted data fragment with a Prefix of Prefix-2, specifically, an application scenario for inserting the data fragment may be a scenario in which, when a certain RocksDB has a failure and data in the failed RocksDB needs to be inserted into a normal RocksDB, the insertion process here may be to implement a repair process on the data in the failed RocksDB, and the data in the failed RocksDB may be a to-be-repaired data fragment, it may be understood that an SST file may be generated by segmenting the to-be-repaired data fragment, and then inserting the generated SST file into the RocksDB to restore external data to the local RocksDB. Since RocksDB uses LSM storage engine and has a hierarchical property, in order to prevent data collision, when data insertion is performed, if data collision is found, data on a receiving data interface is placed at an upper layer, for example, data may be placed at level 1 in fig. 2, and when compact is reached, judgment and processing are performed. The receive data interface here may be an ingest interface.
Further, in fig. 4, data fragments subjected to data repair, that is, data fragments inserted have the same Prefix, that is, Prefix-2, and theoretically do not overlap with original data, but due to the presence of SST files spanning multiple prefixes, for example, Prefix-1 to Prefix-3 files in fig. 4, when Prefix-2 needs to be inserted, it is considered that Prefix-2 in the original RocksDB is a prefixed SST file, that is, collision data is determined to be present, so that the inserted data cannot be placed at the bottom layer, for example, cannot be placed at level 4 in fig. 2, and when a large data block is placed at the upper layer, a huge spatial magnification condition exists in subsequent compact, and the magnification factor may even reach more than 2 times.
Fig. 5 is a schematic diagram of a process when data fragments are inserted in the related art, and as shown in fig. 5, when data files across prefixes exist in the storage space numbered 51 in the tier 5 and the storage space numbered 62 in the tier 6, a new data file cannot be directly inserted into the storage space numbered 51 in the tier 5 and the storage space numbered 62 in the tier 6.
In view of the above technical problem, in some embodiments of the present invention, a data storage method is provided.
Fig. 6 is a flowchart of a data storage method according to an embodiment of the present invention, and as shown in fig. 6, the flowchart may include:
step 601: splitting data files containing at least two different prefixes in a database according to prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files sorted according to keys;
here, the database may be a database in which the index manner is key value storage, for example, the database may be RocksDB, the data file refers to a file sorted according to keys, specifically, the data file refers to a file in which data storage is performed in an order of pre-sorted keys in the database in which the index manner is key value storage, for example, the data file may be an SST file in the RocksDB.
As an embodiment, the data file containing at least two different prefixes may be a data file with prefixes between Prefix-1 and Prefix-3, that is, the prefixes of all keys in the data file may be any one of Prefix-1 or Prefix-3, and of course, in the case of Prefix-2, it may also be Prefix-2 between Prefix-1 and Prefix-3; the data file with at least two different prefixes in the database is split according to the prefixes, so that each split data file corresponds to one Prefix, and the data file can be a data file C with a Prefix between Prefix-1 and Prefix-3 (here, the Prefix of the data file does not contain Prefix-2) split into a data file a with a Prefix of Prefix-1 and a data file B with a Prefix of Prefix-3.
Fig. 7 is a schematic diagram of a splitting process of a data file according to an embodiment of the present invention, where, as shown in fig. 7, Prefix-1 represents a data file whose N prefixes are all Prefix-1, Prefix-1 to Prefix-3 represent a data file whose prefixes are between Prefix-1 to Prefix-3, and Prefix-3 represents a data file whose N prefixes are all Prefix-3. Where N may be an integer greater than or equal to 1. It can be seen that, through the data file splitting, one data file corresponding to Prefix-1 to Prefix-3 is split into a data file with Prefix-1 and a data file with Prefix-3, respectively.
Step 602: and carrying out data processing on the at least two split data files. .
Here, the data processing performed on the at least two split data files may refer to performing data processing on each data file when a processing instruction for each data file is received. Specifically, when a processing request including at least one of the at least two split data files is received, data processing may be performed on the at least one data file, for example, when a processing request for a Prefix-1 data file is received, the Prefix-1 data file obtained after splitting may be processed. It can be understood that, for the implementation manner of the processing request for the Prefix-1 data file, for example, the processing request for the data fragment by the upper layer service may include Prefix-1 Prefix information, and of course, the processing request for the data fragment by the upper layer service including the Prefix-1 Prefix information may also implement processing of all other Prefix-1 data files except the Prefix-1 data file obtained after splitting.
In practical applications, the steps 601 to 602 may be implemented by a Processor in an electronic Device, where the Processor may be at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), an FPGA, a Central Processing Unit (CPU), a controller, a microcontroller, and a microprocessor.
It can be seen that, in the data storage method provided in the embodiment of the present invention, the data files including at least two different prefixes are split according to the prefixes in the database, so as to obtain at least two split data files, where each split data file corresponds to one of the at least two different prefixes, and the data files are sorted according to the keys, and perform data processing on the at least two split data files.
In an embodiment, the splitting the data file according to the prefix, where the database includes at least two data files with different prefixes to obtain at least two split data files includes: and splitting the data file containing at least two different prefixes into the at least two split data files which are arranged in order according to the sequence of the at least two different prefixes.
Here, the data file containing at least two different prefixes is split into the at least two split data files arranged in order according to the sequence of the at least two different prefixes, which may be, for example, the data files of Prefix-1 to Prefix-3 are split into two data files of Prefix-1 and Prefix-3 according to the sequence of prefixes Prefix-1 and Prefix-3. Thus, it can be ensured that the data is still sorted according to the key order.
In one embodiment, the performing data processing on the at least two split data files includes: and deleting data of at least one of the at least two split data files, and/or inserting data of the at least two split data files.
Here, data deletion is performed on at least one data file of the at least two split data files, and illustratively, the data file with Prefix-1 and the data file with Prefix-3 obtained after splitting may be deleted; the data insertion for the data file may be to insert a data file Prefix-2 outside the RocksDB between the Prefix-1 data file and the Prefix-3 data file obtained after the splitting.
In an example, the deleting data of at least one of the at least two split data files includes: and when a data deleting instruction containing a first prefix is received and the first prefix contains the prefix of at least one data file in the at least two split data files, deleting the data file corresponding to the first prefix in the at least two split data files.
Here, the first Prefix may refer to first Prefix information, for example, the first Prefix may be directly the Prefix-1, or may be any other character indicating the Prefix-1. The receiving of the data deletion instruction containing the first Prefix may be sending the data deletion instruction containing the first Prefix to the database by a manual operation, and the database receives the data deletion instruction containing the first Prefix, where the first Prefix contains a Prefix of at least one of the at least two split data files, and may be that the first Prefix contains a Prefix of at least one of a Prefix-1 data file and a Prefix-3 data file obtained after splitting, for example, the first Prefix may contain a Prefix-1 and/or a Prefix-3. Deleting the data file corresponding to the first Prefix from the at least two split data files, which may be, for example, determining the data file corresponding to the first Prefix involved in the deletion command from the at least two split data files, and deleting the determined data file, for example, when the first Prefix is Prefix-1, deleting the Prefix-1 data file from the Prefix-1 data file and the Prefix-3 data file obtained after splitting.
It can be understood that the data deletion instruction including the first Prefix may be a data deletion instruction of the upper layer service to the data fragment, at this time, not only the Prefix-1 data file obtained after splitting needs to be deleted, but also other data files with prefixes of Prefix-1 need to be deleted. It can be seen that, when data deletion is performed on at least one of the at least two split data files, all data to be deleted can be deleted quickly at one time, and the problems of space amplification and data residue do not exist.
In one embodiment, the performing data insertion on the at least two split data files includes:
when a data insertion instruction containing a data file to be inserted is received and the data file to be inserted contains a second prefix, determining a data insertion position according to the second prefix, and inserting the data file to be inserted into the determined insertion position.
Here, the data file to be inserted may refer to a data file requiring repair outside the RocksDB, the data insertion instruction may be an instruction formed according to the data file requiring repair, the second Prefix may refer to a Prefix of the data file requiring repair, for an embodiment where a data insertion position is determined according to the second Prefix, specifically, in a case where the second Prefix is Prefix-2, it may be determined that an insertion position of the data to be inserted is between the data file having the Prefix-1 Prefix and the data file having the Prefix-3 Prefix, further, the data file to be inserted is inserted at the determined insertion position, and it may be determined that the data file to be inserted having the Prefix-2 is inserted between the data file having the Prefix-1 Prefix and the data file having the Prefix-3 Prefix.
It can be seen that, data insertion is performed on at least two split data files after splitting, and because no data file across prefixes exists, a file to be inserted can be directly inserted into any layer of an LSM layer and can be directly inserted into a position to be inserted without performing compact, thereby avoiding possible space method and resource consumption.
Fig. 8 is a schematic process diagram of the insertion of the data fragment according to the embodiment of the present invention, and as shown in fig. 8, since no data file across prefixes exists at level 0, level 5, and level 6, the data fragment to be inserted may be inserted into any layer.
In one embodiment, said determining a data insertion location from said second prefix comprises: and determining a data insertion position according to the second prefix and the prefix sequence of the at least two split data files.
As an embodiment, the determining the data insertion position according to the second prefix and the prefix sequence of the at least two split data files may be determining a sequence relationship between the second prefix and prefixes of the at least two split data files according to the second prefix and the prefix sequence of the at least two split data files, and determining the data insertion position according to the sequence relationship, for example, when the second prefix is a letter B and the prefix sequence of the at least two split data files is a letter a and a letter C, it is determined that a letter B should be between the letter a and the letter B, and further, it is determined that the data insertion position is between the data file with the prefix a and the data file with the prefix B.
In one embodiment, the data file is an SST file.
Fig. 9 is a schematic structural diagram of a data storage device according to an embodiment of the present invention, and as shown in fig. 9, the data storage device may include: a splitting module 901 and a processing module 902; wherein the content of the first and second substances,
the splitting module 901 is configured to split data files containing at least two different prefixes in a database according to prefixes, so as to obtain at least two split data files, where each split data file corresponds to one of the at least two different prefixes, and the data file is a file that is sorted by referring to a key;
a processing module 902, configured to perform data processing on the at least two split data files.
Optionally, the splitting module 901 is configured to split the data file containing at least two different prefixes into the at least two split data files ordered according to the sequence of the at least two different prefixes.
Optionally, the processing module 902 is configured to perform data deletion on at least one of the at least two split data files, and/or perform data insertion on the at least two split data files.
Optionally, the processing module 902 is configured to, when a data deletion instruction including a first prefix is received and the first prefix includes a prefix of at least one data file of the at least two split data files, delete a data file corresponding to the first prefix from the at least two split data files.
Optionally, the processing module 902 is configured to receive a data insertion instruction including a data file to be inserted, and when the data file to be inserted includes a second prefix, determine a data insertion position according to the second prefix, and insert the data file to be inserted in the determined insertion position.
Optionally, the processing module 902 is configured to determine a data insertion position according to the second prefix and the prefix sequence of the at least two split data files.
Optionally, the data file is an SST file.
In practical applications, both the splitting module 901 and the processing module 902 may be implemented by a processor in an electronic device, and the processor may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor.
It can be seen that, in the data storage device provided in the embodiment of the present invention, the splitting module splits the data files containing at least two different prefixes in the database according to the prefixes to obtain at least two split data files, where each split data file corresponds to one of the at least two different prefixes, and the data files are sorted according to keys; and the processing module is used for processing the data of the at least two split data files, and the split data files only correspond to one prefix and do not have data files corresponding to the at least two prefixes, so that the corresponding data files can be directly processed according to the prefixes when the data files are processed, and the expense in data processing is reduced.
In addition, each functional module in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware or a form of a software functional module.
Based on the understanding that the technical solution of the present embodiment essentially or a part contributing to the prior art, or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, and include several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (processor) to execute all or part of the steps of the method of the present embodiment. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Specifically, the computer program instructions corresponding to an event processing method in the present embodiment may be stored on a storage medium such as an optical disc, a hard disc, a usb disk, or the like, and when the computer program instructions corresponding to a data storage method in the storage medium are read or executed by an electronic device, any one of the event processing methods of the foregoing embodiments is implemented.
Based on the same technical concept as the foregoing embodiment, referring to fig. 10, it shows an electronic device provided by an embodiment of the present invention, which may include: a memory 1001 and a processor 1002; wherein the content of the first and second substances,
the memory 1001 for storing computer programs and data;
the processor 1002 is configured to execute the computer program stored in the memory to implement any one of the data storage methods of the foregoing embodiments.
In practical applications, the memory 1001 may be a volatile memory (RAM); or a non-volatile memory (non-volatile memory) such as a ROM, a flash memory (flash memory), a Hard Disk (Hard Disk Drive, HDD) or a Solid-State Drive (SSD); or a combination of the above types of memories and provides instructions and data to the processor 1002.
The processor 1002 may be at least one of an ASIC, a DSP, a DSPD, a PLD, an FPGA, a CPU, a controller, a microcontroller, and a microprocessor. It is to be understood that, for different augmented reality cloud platforms, other electronic devices may be used to implement the above-described processor function, and the embodiment of the present invention is not particularly limited.
In some embodiments, the functions of the apparatus provided in the embodiments of the present invention or the modules included in the apparatus may be used to execute the method described in the above method embodiments, and for specific implementation, reference may be made to the description of the above method embodiments, and for brevity, details are not described here again
The foregoing description of the various embodiments is intended to highlight various differences between the embodiments, and the same or similar parts may be referred to each other, which are not repeated herein for brevity
The methods disclosed in the method embodiments provided by the present application can be combined arbitrarily without conflict to obtain new method embodiments.
Features disclosed in various product embodiments provided by the application can be combined arbitrarily to obtain new product embodiments without conflict.
The features disclosed in the various method or apparatus embodiments provided herein may be combined in any combination to arrive at new method or apparatus embodiments without conflict.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (10)

1. A method of data storage, the method comprising:
splitting data files containing at least two different prefixes in a database according to prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files sorted according to keys;
and carrying out data processing on the at least two split data files.
2. The method of claim 1, wherein splitting the data files according to the prefixes, the data files containing at least two different prefixes in the database, and obtaining at least two split data files, comprises:
and splitting the data file containing at least two different prefixes into the at least two split data files which are arranged in order according to the sequence of the at least two different prefixes.
3. The method according to claim 1, wherein the performing data processing on the at least two split data files comprises:
and deleting data of at least one of the at least two split data files, and/or inserting data of the at least two split data files.
4. The method of claim 3, wherein the deleting data of at least one of the at least two split data files comprises:
and when a data deleting instruction containing a first prefix is received and the first prefix contains the prefix of at least one data file in the at least two split data files, deleting the data file corresponding to the first prefix in the at least two split data files.
5. The method of claim 3, wherein the performing data insertion on the at least two split data files comprises:
when a data insertion instruction containing a data file to be inserted is received and the data file to be inserted contains a second prefix, determining a data insertion position according to the second prefix, and inserting the data file to be inserted into the determined insertion position.
6. The method of claim 5, wherein said determining a data insertion location based on said second prefix comprises:
and determining a data insertion position according to the second prefix and the prefix sequence of the at least two split data files.
7. Method according to any of claims 1 to 6, characterized in that said data file is a sorted queue table SST file.
8. A data storage device, characterized in that the device comprises: a splitting module and a processing module, wherein,
the splitting module is used for splitting data files containing at least two different prefixes in the database according to the prefixes to obtain at least two split data files, wherein each split data file corresponds to one of the at least two different prefixes, and the data files are files which are sorted according to keys;
and the processing module is used for carrying out data processing on the at least two split data files.
9. An electronic device comprising a processor and a memory for storing a computer program operable on the processor; wherein the content of the first and second substances,
the processor is configured to perform the data storage method of any one of claims 1-7 when the computer program is executed.
10. A computer storage medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the data storage method of any one of claims 1 to 7.
CN202010522791.XA 2020-06-10 2020-06-10 Data storage method, device, equipment and storage medium Pending CN113779014A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010522791.XA CN113779014A (en) 2020-06-10 2020-06-10 Data storage method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010522791.XA CN113779014A (en) 2020-06-10 2020-06-10 Data storage method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113779014A true CN113779014A (en) 2021-12-10

Family

ID=78834743

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010522791.XA Pending CN113779014A (en) 2020-06-10 2020-06-10 Data storage method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113779014A (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870492A (en) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 Data storing method and device based on key sorting
CN109388641A (en) * 2018-10-22 2019-02-26 无锡华云数据技术服务有限公司 Method, the equipment, medium of the common prefix of key in a kind of retrieval key value database
US10338972B1 (en) * 2014-05-28 2019-07-02 Amazon Technologies, Inc. Prefix based partitioned data storage

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870492A (en) * 2012-12-14 2014-06-18 腾讯科技(深圳)有限公司 Data storing method and device based on key sorting
US10338972B1 (en) * 2014-05-28 2019-07-02 Amazon Technologies, Inc. Prefix based partitioned data storage
CN109388641A (en) * 2018-10-22 2019-02-26 无锡华云数据技术服务有限公司 Method, the equipment, medium of the common prefix of key in a kind of retrieval key value database

Similar Documents

Publication Publication Date Title
US10678654B2 (en) Systems and methods for data backup using data binning and deduplication
US7065619B1 (en) Efficient data storage system
US20180307428A1 (en) Data storage method, electronic device, and computer non-volatile storage medium
US8271456B2 (en) Efficient backup data retrieval
KR102034833B1 (en) Apparatus for Accessing Data Using Internal Parallelism of Flash Storage based on Key-Value and Method thereof
US20180107402A1 (en) System and method for data storage using log-structured merge trees
US20080183767A1 (en) Efficient data storage system
US8504541B2 (en) File management method
CN110888837B (en) Object storage small file merging method and device
US8225060B2 (en) Data de-duplication by predicting the locations of sub-blocks within the repository
US11841826B2 (en) Embedded reference counts for file clones
CN110647502A (en) File loading method, equipment, device and computer storage medium
JPH1131096A (en) Data storage/retrieval system
EP3803613A1 (en) Chunk-based data deduplication
KR20210058118A (en) Casedb: low-cost put-intensive key-value store for edge computing
WO2014157243A1 (en) Storage control device, control method for storage control device, and control program for storage control device
CN106980680B (en) Data storage method and storage device
Tulkinbekov et al. CaseDB: Lightweight key-value store for edge computing environment
US8156126B2 (en) Method for the allocation of data on physical media by a file system that eliminates duplicate data
CN113779014A (en) Data storage method, device, equipment and storage medium
US20130218851A1 (en) Storage system, data management device, method and program
CN114115734A (en) Data deduplication method, device, equipment and storage medium
CN114416676A (en) Data processing method, device, equipment and storage medium
CN114968069A (en) Data storage method and device, electronic equipment and storage medium
CN112131194A (en) File storage control method and device of read-only file system and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination